<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Patima Poochai</title>
    <description>The latest articles on DEV Community by Patima Poochai (@patimapoochai).</description>
    <link>https://dev.to/patimapoochai</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2597891%2F4d4ff6e6-fdc2-4fa7-94ef-79f93c94bece.png</url>
      <title>DEV Community: Patima Poochai</title>
      <link>https://dev.to/patimapoochai</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/patimapoochai"/>
    <language>en</language>
    <item>
      <title>How To Resize LVM Volumes Dynamically Using Cloud-Init (Step-By-Step)</title>
      <dc:creator>Patima Poochai</dc:creator>
      <pubDate>Thu, 26 Mar 2026 08:56:42 +0000</pubDate>
      <link>https://dev.to/patimapoochai/how-to-resize-lvm-volumes-dynamically-in-cloud-init-vms-step-by-step-1oml</link>
      <guid>https://dev.to/patimapoochai/how-to-resize-lvm-volumes-dynamically-in-cloud-init-vms-step-by-step-1oml</guid>
      <description>&lt;p&gt;Logical Volume Manager (LVM) is the preferred storage framework for Linux VMs. It's easy to resize LVM volumes, migrate data between devices, and combine multiple physical disks into one easy-to-mange volume. These features are especially useful in a virtualized environment, as they enable you to do these tasks without physical access to the machine.&lt;/p&gt;

&lt;p&gt;When creating VM templates, it's common to create a fixed-size volume for the root partition and &lt;strong&gt;use cloud-init to grow the partition size dynamically when cloning the template&lt;/strong&gt; (unless you want to &lt;em&gt;create multiple templates&lt;/em&gt; with each one having 10G, 11G, 12G... storage and so on). However, you can't simply use the built-in growpart module to resize logical volumes. It &lt;a href="https://forum.proxmox.com/threads/cloud-init-lvm-resize-not-working.68947/" rel="noopener noreferrer"&gt;only works on partitions&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;But that doesn't mean you should avoid using LVM in your templates. After a few trials and tribulations, I've found a workaround that allows cloud-init to automatically resize logical volumes, as well as a few pitfalls you should avoid when using cloud-init with Debian machines. Here's what I've learned.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fur32gebhokmal55curod.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fur32gebhokmal55curod.gif" alt="Animated GIF depicting a fight between Debian and LVM as the fight between the Boss and Snake from Metal Gear Solid 3." width="450" height="253"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Two Steps to Grow LVM Volumes with Cloud-Init
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcaspg3ugkp3vo05s7kcc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcaspg3ugkp3vo05s7kcc.png" alt="The partition configuration of the Linux VM. The third partition has 30 GB of storage." width="428" height="117"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here's the scenario. There are 3 partitions on my VM template, with &lt;code&gt;/dev/sda3&lt;/code&gt; containing the root logical volume (LV) named &lt;code&gt;ubuntu--vg-ubuntu--lv&lt;/code&gt;. The VM's disk had 30 GB of storage, but I've resized the drive to be 70 GB. The root LV is still 30 GB, so I need cloud-init to automatically resize the volume to fill the remaining space.&lt;/p&gt;

&lt;h3&gt;
  
  
  Grow the Partition
&lt;/h3&gt;

&lt;p&gt;First, we need to &lt;strong&gt;grow the partition&lt;/strong&gt; using the &lt;a href="https://docs.cloud-init.io/en/latest/reference/modules.html#growpart" rel="noopener noreferrer"&gt;growpart&lt;/a&gt; module. While we can't use growpart directly on the root LV, we can use it to grow the partition that contains the underlying physical volume (PV) of the root volume.&lt;/p&gt;

&lt;p&gt;Create a cloud-init configuration in &lt;code&gt;/etc/cloud/cloud.cfg.d/90-LVM.cfg&lt;/code&gt;, and add the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;growpart&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;devices&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;/dev/sda3&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To test the config, set cloud-init to run on the next boot and reboot the computer.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;cloud-init clean &lt;span class="nt"&gt;-r&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here's how the partitions look after growpart.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk94jaijq276ag6b3ssps.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk94jaijq276ag6b3ssps.png" alt="The partitions of the VM, now with the third partition having 68 GB." width="425" height="123"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Growpart expanded &lt;code&gt;/dev/sda3&lt;/code&gt; to 68 GB using the newly added free space in the storage disk.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8sybz46otn4igfvansb8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8sybz46otn4igfvansb8.png" alt="PVS command showing that the PV of the root volume was expanded." width="380" height="43"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Because the &lt;code&gt;sda3&lt;/code&gt; partition is mapped to the root PV, &lt;code&gt;debian-vg&lt;/code&gt;, it also expanded the underlying PV of the volume. We can now expand the LV to fill the space in the physical volume.&lt;/p&gt;

&lt;h3&gt;
  
  
  Grow the Root Volume
&lt;/h3&gt;

&lt;p&gt;Second, we need to &lt;strong&gt;grow the logical volume&lt;/strong&gt;. There is no built-in cloud-init module to manage LVM volumes, but we can use the &lt;a href="https://docs.cloud-init.io/en/latest/reference/modules.html#runcmd" rel="noopener noreferrer"&gt;runcmd&lt;/a&gt; module to execute the shell commands to resize the logical volume.&lt;/p&gt;

&lt;p&gt;One caveat is that we have to make sure runcmd only triggers after the growpart module. Otherwise, we are expanding the LV before the PV is expanded. We can check the modules execution order by looking at the default cloud-init config at &lt;code&gt;/etc/cloud/cloud.cfg&lt;/code&gt; and making sure that &lt;code&gt;runcmd&lt;/code&gt; is executed after &lt;code&gt;growpart&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Favsjrtpee3wbjcsevtzu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Favsjrtpee3wbjcsevtzu.png" alt="Snippet of the boot stages and their modules." width="351" height="487"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Modules in the "config" stages will always run after the "init" stage, so make sure that runcmd is under &lt;code&gt;cloud_config_module&lt;/code&gt;.  &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Fun fact: runcmd actually &lt;a href="https://docs.cloud-init.io/en/latest/reference/modules.html#runcmd" rel="noopener noreferrer"&gt;defers the execution until the scripts_user module in the "final" boot stage&lt;/a&gt;, so make sure that &lt;code&gt;scripts_user&lt;/code&gt; is under &lt;code&gt;cloud_final_module&lt;/code&gt; as well.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fat6xotmjok3hfp03nsj7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fat6xotmjok3hfp03nsj7.png" alt="Excerpt from the official documentation describing how the runcmd module runs its script in the final boot stage." width="800" height="72"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now, we can add a runcmd command to resize the root LV into &lt;code&gt;/etc/cloud/cloud.cfg.d/90-LVM.cfg&lt;/code&gt;. In my case, the root path (&lt;code&gt;/&lt;/code&gt;) of my VM is mounted to a logical volume named &lt;code&gt;ubuntu-lv&lt;/code&gt;, which is mapped to a PV named &lt;code&gt;ubuntu-vg&lt;/code&gt;, so my runcmd looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# append to /etc/cloud/cloud.cfg.d/90-LVM.cfg&lt;/span&gt;
&lt;span class="na"&gt;runcmd&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;lvresize&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;-l&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;+100%FREE&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;-r&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;/dev/ubuntu-vg/ubuntu-lv&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This command resizes the volume using all of the remaining space (&lt;code&gt;+100%FREE&lt;/code&gt;) while also resizing the file system in the volume (&lt;code&gt;-r&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;Test the config again by rebooting the machine and rerunning cloud-init:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;cloud-init clean &lt;span class="nt"&gt;-r&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Upon reboot, cloud-init should resize the root LV to use the remaining free space in the storage disk.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6y23l5tts0ig7x8n459p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6y23l5tts0ig7x8n459p.png" alt="Partitions of the VM, now the root volume is using all of the available space." width="452" height="125"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Troubleshooting
&lt;/h3&gt;

&lt;p&gt;If cloud-init isn't resizing the volumes as expected, your first troubleshooting step should be checking the logs at &lt;code&gt;/var/log/cloud-init.log&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;For example, I had some confusion regarding the possible values for the runcmd module when I started working on this project. Look at this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq98uut02cwybbdpy5frv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq98uut02cwybbdpy5frv.png" alt="Snippet of the runcmd module with its schema." width="800" height="200"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Does this mean the module accepts 1) an array containing arrays containing strings, or 2) an array containing arrays of strings, &lt;em&gt;or&lt;/em&gt; just a string? To get a practical understanding of the schema, I first tried using a simple string for the module.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F82aawnh4l0z51oerc5ou.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F82aawnh4l0z51oerc5ou.png" alt="A snippet of the runcmd module with only a string as its input." width="417" height="28"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After a reboot, cloud-init ran the configuration, but the LV is still the same size. Take a look at the logs:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0mui4hbywmcehq4z4426.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0mui4hbywmcehq4z4426.png" alt="Snippet of the cloud-init logs. One of the errors shows that the runcmd module did not expect a string as its input." width="800" height="309"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Note the &lt;code&gt;TypeError: Input to shellify was type 'str'. expected list or tuple&lt;/code&gt; line. This line tells us that the module expects the inputs to look like this in practice:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# this is correct&lt;/span&gt;
&lt;span class="s"&gt;Array [&lt;/span&gt;
  &lt;span class="s"&gt;- StringArray ["a", "b"]&lt;/span&gt;
  &lt;span class="s"&gt;- String "abc"&lt;/span&gt;
  &lt;span class="s"&gt;- Null &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;
&lt;span class="err"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# not this&lt;/span&gt;
&lt;span class="s"&gt;Array [] || String "abc"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Niche issue? Probably. But if you need to write a more advanced runcmd config, reading &lt;code&gt;/var/log/cloud-init.log&lt;/code&gt; can help clarify some of the ambiguity.&lt;/p&gt;

&lt;h2&gt;
  
  
  Debian-specific Issues ("No free sectors")
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkb3v5d5oxhf5qywtmi5c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkb3v5d5oxhf5qywtmi5c.png" alt="Comedic depiction of Debian and LVM as the Boss and Snake from the game Metal Gear Solid 3. Debian is injuring LVM." width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The steps above should work for most cases. However, when using cloud-init and growpart to resize volumes in Debian VMs, you might run into the following error:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhrxslp55j689xyyozoeh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhrxslp55j689xyyozoeh.png" alt="Snippet of the growpart error. The root volume is at sda5, and a line in this error shows that the module failed to resize partition sda3." width="800" height="559"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Note the &lt;code&gt;/dev/sda3: No free sectors available&lt;/code&gt; line. Our root LV is located on &lt;code&gt;sda5&lt;/code&gt;, and growpart failed to resize that partition because it failed to resize &lt;code&gt;sda3&lt;/code&gt;. But... that disk &lt;em&gt;doesn't exist&lt;/em&gt;. Why is growpart trying to resize a non-existent disk?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F06m010hrxvzno7y3ydqq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F06m010hrxvzno7y3ydqq.png" alt="Snippet of the growpart documentation." width="800" height="289"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the documentation, growpart will only resize the last partition on the disk. In practice, this means the &lt;em&gt;last&lt;/em&gt; partition. As in, the partition has to be last numerically, and you cannot skip numbers. If we want growpart to resize our root LV, it has to be on partition &lt;code&gt;sda3&lt;/code&gt; rather than &lt;code&gt;sda5&lt;/code&gt;. In other words, 1,2,3 and not 1,2,5.&lt;/p&gt;

&lt;p&gt;But why does our VM create the partitions in this way? It's because &lt;a href="//wiki.debian.org/Partition#MBR_format_partitions"&gt;the MBR partitioning scheme requires any LVM storage to be on sda5&lt;/a&gt;. A GPT partition scheme won't have this issue, so we should change the partitioning scheme of our VM to use GPT instead.&lt;/p&gt;

&lt;p&gt;But how do you make Debian use GPT over MBR? By using UEFI. The Debian installer automatically decides the partitioning scheme &lt;a href="https://unix.stackexchange.com/questions/518377/where-does-the-debian-installer-choose-mbr-vs-gpt" rel="noopener noreferrer"&gt;based on whether you're using BIOS or UEFI&lt;/a&gt;. So if we want the partitions to be in order without skipping numbers, we have to complete the installation process with UEFI enabled.&lt;/p&gt;

&lt;p&gt;Convoluted? Yes, but the fix is simple: &lt;strong&gt;use UEFI during installation&lt;/strong&gt;. In Proxmox, you can enable UEFI by changing the &lt;code&gt;BIOS&lt;/code&gt; option to &lt;code&gt;OVMF (UEFI)&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8pjxgf2n5yfa77za4k5y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8pjxgf2n5yfa77za4k5y.png" alt="Snippet of the BIOS setting in Proxmox. The BIOS is set to OVMF (UEFI)." width="294" height="159"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Then add an EFI disk:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgvwrv5kcz5wezlntajxj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgvwrv5kcz5wezlntajxj.png" alt="Snippet of the hardware options in Proxmox. The " width="161" height="323"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Then go through the installer as normal and choose &lt;code&gt;Guided - use entire disk and set up LVM&lt;/code&gt; during installation. After completing the installation, your partitions should now be in order:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7k68sf1a3cv78l24dlvk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7k68sf1a3cv78l24dlvk.png" alt="Partitions of the VMs, the root volume is now at sda3." width="411" height="145"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;These are the partitions of my Debian VM after using UEFI during installation. Note how the root LV is now at &lt;code&gt;/dev/sda3&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;I could now use growpart and runcmd to resize the root LV. This is my configuration for the Debian VM:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feq56htysicuwjix2tnwz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feq56htysicuwjix2tnwz.png" alt="The cloud-init config for the Debian VM. The config creates a default user called " width="397" height="255"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here's the result after rerunning cloud-init:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5eyqi9enx0vc6dh13uj4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5eyqi9enx0vc6dh13uj4.png" alt="Partitions of the Debian VM, now the root volume is using all of the available space on the storage disk." width="427" height="158"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Cloud-init resized the root LV without the &lt;code&gt;No free sectors&lt;/code&gt; issue.&lt;/p&gt;
&lt;h3&gt;
  
  
  Troubleshooting "status: disabled" and Unrecognized Cloud-init Drive Issues
&lt;/h3&gt;

&lt;p&gt;You might run into other issues with cloud-init while setting up a Debian VM. Here's a short guide on how I troubleshoot and resolve these issues.&lt;/p&gt;

&lt;p&gt;First, I got this error:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr5vcnnil4844kulmyjzj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr5vcnnil4844kulmyjzj.png" alt="Snippet of the " width="385" height="101"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Cloud-init wasn't running, and running a status check tells us that it was disabled by the cloud-init-generator, but it doesn't tell us why.&lt;/p&gt;

&lt;p&gt;Maybe the source code of cloud-init-generator can tell us more about the cause of the issue. You can find cloud-init-generator and its logs by running this command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;dpkg-query &lt;span class="nt"&gt;-L&lt;/span&gt; cloud-init | less
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The command shows all the files that were installed with the cloud-init package.  Look for &lt;code&gt;cloud-init-generator&lt;/code&gt;:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flr5mc509caeinr3t7n0j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flr5mc509caeinr3t7n0j.png" alt="Filtered result of the dpkg-query command. It shows that cloud-init-generator is located in the /usr/lib directory." width="451" height="30"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It's in &lt;code&gt;/usr/lib/systemd/system-generators/cloud-init-generator&lt;/code&gt;. Here's the first few lines of the file:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl709mtuo511k2mncyoi6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl709mtuo511k2mncyoi6.png" alt="Snippet of the system-generators file. One of the lines contains the location of the log file for this script." width="447" height="126"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Note the &lt;code&gt;LOG_F&lt;/code&gt; variable. That's the location of the log file where we can learn more about why Cloud-init was disabled.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6l2peltpd8m2go7sc4aq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6l2peltpd8m2go7sc4aq.png" alt="Snippet of the log file. A line describes that the script ran but didn't find any data source." width="800" height="81"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Cloud-init used the &lt;code&gt;ds-identify&lt;/code&gt; component to identify data sources, and it couldn't find any valid configuration sources. However, I've attached a cloud-init drive to the VM via the Proxmox GUI, so what's going on?&lt;/p&gt;

&lt;h3&gt;
  
  
  Data Sources Formatting
&lt;/h3&gt;

&lt;p&gt;Let's check the logs related to ds-identify at &lt;code&gt;/run/cloud-init/ds-identify.log&lt;/code&gt; (also recommended by &lt;a href="https://docs.cloud-init.io/en/latest/howto/debugging.html#cloud-init-did-not-run" rel="noopener noreferrer"&gt;the documentation&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv1mjpso0tept57o0ic1g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv1mjpso0tept57o0ic1g.png" alt="A line from the log file describing how the field datasource_list wasn't found." width="800" height="37"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;From the &lt;code&gt;WARN: no datasource_list found&lt;/code&gt; message, it seems like one of the problems is that the configuration expects you to use &lt;code&gt;datasource_list&lt;/code&gt; in your configuration.&lt;/p&gt;

&lt;p&gt;Here's how I changed the data sources configuration:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgycw79bmx19gjxv0n93g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgycw79bmx19gjxv0n93g.png" alt="Text snippet showing the wrong and right way to write the data sources section. The data sources list is on a single line with a list as the value." width="321" height="124"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And here's the output of &lt;code&gt;/run/cloud-init/ds-identify.log&lt;/code&gt; after applying this change:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffvgjw7gq7xt25ia5ldlv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffvgjw7gq7xt25ia5ldlv.png" alt="A line from the ds-identify log file. The script can now read the data sources list." width="594" height="43"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Cloud-init now detects the sources list, but it still doesn't recognize the cloud-init device. I was puzzled by this issue for a while, until I found something a few days later.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cloud-init Drive Interface
&lt;/h3&gt;

&lt;p&gt;As of March 2026, according to &lt;a href="https://forum.proxmox.com/threads/cloud-init-drive-missing.174803/" rel="noopener noreferrer"&gt;this post&lt;/a&gt; and &lt;a href="https://forum.proxmox.com/threads/cicustom-cloud-init-stopped-working.164297/#post-759392" rel="noopener noreferrer"&gt;this post&lt;/a&gt;, there is a compatibility issue between IDE devices and OVMF. In practice, cloud-init drives that use IDE aren't recognized by the VM if you're using OVMF (UEFI). &lt;/p&gt;

&lt;p&gt;The fix is simple: &lt;strong&gt;use SCSI for your cloud-init drive&lt;/strong&gt;. When you're creating the cloud-init drive, choose the SCSI option in the Proxmox GUI:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvs87qara09011588plpv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvs87qara09011588plpv.png" alt="Proxmox GUI for the CloudInit Drive setting. The drive is set to SCSI." width="293" height="172"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For example, here are the hardware options for my VM:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F05fhp3qx17sjd9q5s1ir.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F05fhp3qx17sjd9q5s1ir.png" alt="The hardware options in Proxmox. The cloud-init drive is set to IDE." width="498" height="154"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here's the list of block devices recognized by the VM. Note how, despite having two &lt;code&gt;cdrom&lt;/code&gt; drives and one of them being the cloud-init drive, only the CD/DVD drive shows up in the list.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq762jrmw1zhxuyxmf6wb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq762jrmw1zhxuyxmf6wb.png" alt="Block devices list of the VM. Cloud-init drive is not shown." width="431" height="125"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now, I changed the cloud-init drive to use SCSI:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwpd78g4ftxznkrm9v6ph.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwpd78g4ftxznkrm9v6ph.png" alt="The hardware options in Proxmox. The cloud-init drive is set to SCSI." width="594" height="177"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here's the updated list of block devices. Note how the VM now recognizes the cloud-init drive at &lt;code&gt;/dev/sr0&lt;/code&gt;. The VM should now recognize the cloud-init drive and execute your configuration on startup.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0s0a5ng8lfx0jnktl800.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0s0a5ng8lfx0jnktl800.png" alt="Block devices list of the VM. Cloud-init drive is on the list." width="438" height="138"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing
&lt;/h2&gt;

&lt;p&gt;Questions? Thoughts? Feel free to leave a comment!&lt;/p&gt;

&lt;p&gt;Need someone skilled in RHEL, Kubernetes, and AWS? I'm open to work! View my &lt;a href="https://patimapoochai.github.io/" rel="noopener noreferrer"&gt;portfolio&lt;/a&gt; and reach out via &lt;a href="//www.linkedin.com/in/patima-poochai808"&gt;LinkedIn&lt;/a&gt; or &lt;a href="https://infosec.exchange/@patlikestechnology" rel="noopener noreferrer"&gt;Mastodon&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>tutorial</category>
      <category>linux</category>
      <category>proxmox</category>
    </item>
    <item>
      <title>How to fix the "opening socket 'charon.vici' failed: Permission denied" issue in Proxmox</title>
      <dc:creator>Patima Poochai</dc:creator>
      <pubDate>Mon, 02 Mar 2026 21:55:58 +0000</pubDate>
      <link>https://dev.to/patimapoochai/how-to-fix-the-opening-socket-charonvici-failed-permission-denied-issue-in-proxmox-cd4</link>
      <guid>https://dev.to/patimapoochai/how-to-fix-the-opening-socket-charonvici-failed-permission-denied-issue-in-proxmox-cd4</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpj3ib71acv7ozulpavw4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpj3ib71acv7ozulpavw4.png" alt="Visual representation of the charon.vici issue. AppArmor is depicted as shooting strongSwan's IPsec security associations." width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;I'm working on setting up SDN zones and IPsec encryption for my Proxmox VE 9 machines. I followed the usual &lt;a href="https://pve.proxmox.com/pve-docs/chapter-pvesdn.html" rel="noopener noreferrer"&gt;guide&lt;/a&gt; on creating and encrypting SDN zones, installed the usual stuff like &lt;code&gt;frr-pythontools&lt;/code&gt; and &lt;code&gt;strongswan&lt;/code&gt;, but then ran into this issue when using &lt;code&gt;swanctl&lt;/code&gt; to check the status of strongSwan:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;root@pve:/etc/apt/sources.list.d# swanctl &lt;span class="nt"&gt;--stats&lt;/span&gt;
plugin &lt;span class="s1"&gt;'test-vectors'&lt;/span&gt;: failed to load - test_vectors_plugin_create not found and no plugin file available
...
plugin &lt;span class="s1"&gt;'curl'&lt;/span&gt;: failed to load - curl_plugin_create not found and no plugin file available
opening socket &lt;span class="s1"&gt;'unix:///var/run/charon.vici'&lt;/span&gt; failed: Permission denied
Error: connecting to &lt;span class="s1"&gt;'default'&lt;/span&gt; URI failed: Permission denied
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At first, I thought this was a simple permissions issue, but the strongSwan service should be running as the root user. I checked the permissions of the socket at &lt;code&gt;/var/run/charon.vici&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;root@pve:/etc/apt/sources.list.d# &lt;span class="nb"&gt;ls&lt;/span&gt; &lt;span class="nt"&gt;-la&lt;/span&gt; /var/run/charon.vici
srwxrwx--- 1 root root 0 Feb 28 10:39 /var/run/charon.vici
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The root user should have permissions for it, but the logs said it doesn't have the permission. What is going on?&lt;/p&gt;

&lt;h2&gt;
  
  
  Systemd Service Troubleshooting
&lt;/h2&gt;

&lt;p&gt;My first instinct is to check the Systemd logs. &lt;code&gt;strongswan.service&lt;/code&gt; is the service that manages IPsec, and it's the service that &lt;code&gt;swanctl&lt;/code&gt; connects to.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;root@pve:/etc/apt/sources.list.d# systemctl status strongswan
× strongswan.service - strongSwan IPsec IKEv1/IKEv2 daemon using swanctl
     Loaded: loaded &lt;span class="o"&gt;(&lt;/span&gt;/usr/lib/systemd/system/strongswan.service&lt;span class="p"&gt;;&lt;/span&gt; enabled&lt;span class="p"&gt;;&lt;/span&gt; preset: enabled&lt;span class="o"&gt;)&lt;/span&gt;
     Active: failed &lt;span class="o"&gt;(&lt;/span&gt;Result: exit-code&lt;span class="o"&gt;)&lt;/span&gt; since Sat 2026-02-28 10:39:09 HST&lt;span class="p"&gt;;&lt;/span&gt; 3h 55min ago
    ...
    Process: 21205 &lt;span class="nv"&gt;ExecStart&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/usr/sbin/charon-systemd &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;code&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;exited, &lt;span class="nv"&gt;status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0/SUCCESS&lt;span class="o"&gt;)&lt;/span&gt;
    Process: 21234 &lt;span class="nv"&gt;ExecStartPost&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/usr/sbin/swanctl &lt;span class="nt"&gt;--load-all&lt;/span&gt; &lt;span class="nt"&gt;--noprompt&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;code&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;exited, &lt;span class="nv"&gt;status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;13&lt;span class="o"&gt;)&lt;/span&gt;
   Main PID: 21205 &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;code&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;exited, &lt;span class="nv"&gt;status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0/SUCCESS&lt;span class="o"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;StrongSwan is inactive; that's obvious. But one thing that stands out is that the &lt;code&gt;charon-systemd&lt;/code&gt;, the process in the &lt;code&gt;ExecStart&lt;/code&gt; line, started up fine. However, the problem comes from &lt;code&gt;swanctl&lt;/code&gt; that is activated in &lt;code&gt;ExecStartPost&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;Let's look at the logs for &lt;code&gt;strongswan.service&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;root@pve:/etc/apparmor.d# journalctl &lt;span class="nt"&gt;-u&lt;/span&gt; strongswan.service 
Feb 28 10:39:09 pve systemd[1]: Starting strongswan.service - strongSwan IPsec IKEv1/IKEv2 daemon using swanctl...
Feb 28 10:39:09 pve charon-systemd[21205]: Starting charon-systemd IKE daemon &lt;span class="o"&gt;(&lt;/span&gt;strongSwan 6.0.1, Linux 6.17.2-1-pve, x86_64&lt;span class="o"&gt;)&lt;/span&gt;
...
Feb 28 10:39:09 pve charon-systemd[21205]: dropped capabilities, running as uid 0, gid 0
Feb 28 10:39:09 pve charon-systemd[21205]: spawning 16 worker threads
...
Feb 28 10:39:09 pve swanctl[21234]: opening socket &lt;span class="s1"&gt;'unix:///var/run/charon.vici'&lt;/span&gt; failed: Permission denied
Feb 28 10:39:09 pve swanctl[21234]: Error: connecting to &lt;span class="s1"&gt;'default'&lt;/span&gt; URI failed: Permission denied
...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can see here that the main component of strongSwan, &lt;code&gt;charon-systemd&lt;/code&gt;, starts perfectly fine. However, the process of loading the configuration files using &lt;code&gt;swanctl&lt;/code&gt; is producing the same error, and this is error caused Systemd to label the &lt;code&gt;strongswan.service&lt;/code&gt; as "failed" and stopping the service.&lt;/p&gt;

&lt;p&gt;While I didn't find the cause of the issue in the logs, it helped me narrow the potential root cause to a single binary. And if I can fix this issue in &lt;code&gt;swanctl&lt;/code&gt;, we can also fix &lt;code&gt;strongswan.service&lt;/code&gt; failing as well.&lt;/p&gt;

&lt;h2&gt;
  
  
  Temporary Solution From AppArmor Troubleshooting
&lt;/h2&gt;

&lt;p&gt;The source of this idea eludes me, but I stumbled upon a forum post when troubleshooting this issue that pointed out that this might be an issue related to AppArmor, and recommends that I check the entire journalctl logs for AppArmor denials. So I ran &lt;code&gt;journalctl&lt;/code&gt; and looked for any logs related to &lt;code&gt;swanctl&lt;/code&gt; and found this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pve kernel: audit: &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1400 audit&lt;span class="o"&gt;(&lt;/span&gt;1772327420.130:165&lt;span class="o"&gt;)&lt;/span&gt;: &lt;span class="nv"&gt;apparmor&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"DENIED"&lt;/span&gt; &lt;span class="nv"&gt;operation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"create"&lt;/span&gt; &lt;span class="nv"&gt;class&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"net"&lt;/span&gt; &lt;span class="nv"&gt;info&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"failed protocol match"&lt;/span&gt; &lt;span class="nv"&gt;error&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nt"&gt;-13&lt;/span&gt; &lt;span class="nv"&gt;profile&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"/usr/sbin/swanctl"&lt;/span&gt; &lt;span class="nv"&gt;pid&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;64058 &lt;span class="nb"&gt;comm&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"swanctl"&lt;/span&gt; &lt;span class="nv"&gt;family&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"unix"&lt;/span&gt; &lt;span class="nv"&gt;sock_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"stream"&lt;/span&gt; &lt;span class="nv"&gt;protocol&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0 &lt;span class="nv"&gt;requested&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"create"&lt;/span&gt; &lt;span class="nv"&gt;denied&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"create"&lt;/span&gt; &lt;span class="nv"&gt;addr&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;none
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The log states that AppArmor prevented &lt;code&gt;swanctl&lt;/code&gt; from performing a "create" operation on a Unix socket. Looks like this denial might be the reason for the &lt;code&gt;Permission denied&lt;/code&gt; error, but let's confirm that this is really the issue.&lt;/p&gt;

&lt;p&gt;I tested the hypothesis by turning the AppArmor profile of &lt;code&gt;swanctl&lt;/code&gt; into complaint mode, which temporarily removes any restrictions on the process.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;
apt &lt;span class="nb"&gt;install &lt;/span&gt;apparmor-utils &lt;span class="c"&gt;# Install packages that provide apparmor_parser&lt;/span&gt;
apparmor_parser &lt;span class="nt"&gt;-Cr&lt;/span&gt; usr.sbin.swanctl &lt;span class="c"&gt;# Disables AppArmor enforcement&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's run &lt;code&gt;strongswan.service&lt;/code&gt; now.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;root@pve:/etc/apparmor.d# systemctl start strongswan
root@pve:/etc/apparmor.d# systemctl status strongswan
● strongswan.service - strongSwan IPsec IKEv1/IKEv2 daemon using swanctl
     Loaded: loaded &lt;span class="o"&gt;(&lt;/span&gt;/usr/lib/systemd/system/strongswan.service&lt;span class="p"&gt;;&lt;/span&gt; enabled&lt;span class="p"&gt;;&lt;/span&gt; preset: enabled&lt;span class="o"&gt;)&lt;/span&gt;
     Active: active &lt;span class="o"&gt;(&lt;/span&gt;running&lt;span class="o"&gt;)&lt;/span&gt; since Sat 2026-02-28 15:27:15 HST&lt;span class="p"&gt;;&lt;/span&gt; 8s ago
     ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;strongswan.service&lt;/code&gt; is running fine, but what about &lt;code&gt;swanctl&lt;/code&gt;?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;root@pve:/etc/apparmor.d# swanctl &lt;span class="nt"&gt;--stats&lt;/span&gt;
...
&lt;span class="nb"&gt;uptime&lt;/span&gt;: 3 seconds, since Feb 28 15:33:52 2026
worker threads: 16 total, 11 idle, working: 4/0/1/0
job queues: 0/0/0/0
&lt;span class="nb"&gt;jobs &lt;/span&gt;scheduled: 0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Yup, looks like AppArmor was the cause of this issue. The AppArmor denial is preventing &lt;code&gt;swanctl&lt;/code&gt; from creating or accessing the &lt;code&gt;/var/run/charon.vici&lt;/code&gt; socket, which also causes &lt;code&gt;strongswan.service&lt;/code&gt; to fail. By &lt;strong&gt;disabling AppArmor enforcement on &lt;code&gt;swanctl&lt;/code&gt;&lt;/strong&gt;, we can fix the &lt;code&gt;Permission denied&lt;/code&gt; issue.&lt;/p&gt;

&lt;p&gt;However, this is not the perfect solution. While we can run strongSwan without AppArmor, it provides MAC (Mandatory Access Control) rules that can prevent attackers from exploiting zero-day vulnerabilities in the strongSwan process. &lt;/p&gt;

&lt;p&gt;StrongSwan is more secure with AppArmor enabled, and if I can find a way to run AppArmor without denying IPsec, I'd take it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Trying Custom AppArmor Rules
&lt;/h2&gt;

&lt;p&gt;So, if AppArmor is causing deny issues, what if we create a rule to allow &lt;code&gt;swanctl&lt;/code&gt; to access this socket? I found another &lt;a href="https://groups.google.com/g/linux.debian.bugs.dist/c/b6jFluenbpE" rel="noopener noreferrer"&gt;post&lt;/a&gt; that has a potential solution. Essentially, the author modified &lt;code&gt;swanctl&lt;/code&gt;'s AppArmor profile to explicitly allow the process to create and access any required Unix sockets.&lt;/p&gt;

&lt;p&gt;Here's how I implemented the solution. I'm using VICI, strongswan's newer interface, so I add the following to &lt;code&gt;/etc/apparmor.d/usr.sbin.swanctl&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;unix &lt;span class="o"&gt;(&lt;/span&gt;create&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;stream &lt;span class="c"&gt;# Don't actually add this line. This is my own attempt at solving this issue.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I reloaded the profile.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;apparmor_parser &lt;span class="nt"&gt;-r&lt;/span&gt; /etc/apparmor.d/usr.sbin.swanctl
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then I tried to start &lt;code&gt;strongswan.service&lt;/code&gt; with the new profile.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;root@pve:/etc/apparmor.d# systemctl start strongswan
Job &lt;span class="k"&gt;for &lt;/span&gt;strongswan.service failed because the control process exited with error code.
...
root@pve:/etc/apparmor.d# journalctl &lt;span class="nt"&gt;-u&lt;/span&gt; strongswan
...
Feb 28 17:38:25 pve swanctl[87986]: opening socket &lt;span class="s1"&gt;'unix:///var/run/charon.vici'&lt;/span&gt; failed: Permission denied
Feb 28 17:38:25 pve swanctl[87986]: Error: connecting to &lt;span class="s1"&gt;'default'&lt;/span&gt; URI failed: Permission denied
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same error, no dice. However, this attempt did show me that the issue might not come from the profile itself, but something else related to AppArmor.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Real Root Cause And The Long-Term Solution
&lt;/h2&gt;

&lt;p&gt;It had been a long couple of days of searching for the solution to this issue before I stumble upon the hidden gem in this &lt;a href="https://github.com/containerd/containerd/issues/12726" rel="noopener noreferrer"&gt;containerd repository issue&lt;/a&gt;. Basically, &lt;a href="https://github.com/achernya" rel="noopener noreferrer"&gt;Alex Chernyakhovsky&lt;/a&gt; found that AppArmor changed its ABI recently and broke the Unix socket networking options in AppArmor profiles. We have to wait until the upstream developers fix this issue, but something we can do now is to &lt;strong&gt;configure AppArmor to use the previous ABI version&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Open &lt;code&gt;/etc/apparmor/parser.conf&lt;/code&gt; and add this line:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Force pre-kernel 6.17 ABI
override-policy-abi=/etc/apparmor.d/abi/4.0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now we can reapply &lt;code&gt;swanctl&lt;/code&gt;'s profile. In my experience, I needed to clear the cache first using thr &lt;code&gt;--purge-cache&lt;/code&gt; option before loading the new profile.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;apparmor_parser &lt;span class="nt"&gt;--purge-cache&lt;/span&gt;
apparmor_parser &lt;span class="nt"&gt;-r&lt;/span&gt; /etc/apparmor.d/usr.sbin.swanctl
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here's my result after applying this fix:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc9mp45qwjlfzccxh0f8t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc9mp45qwjlfzccxh0f8t.png" alt="Output of the swanctl command still saying Permission denied." width="800" height="414"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmkj8vwcnx8zg1xed8pcl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmkj8vwcnx8zg1xed8pcl.png" alt="Second output of the swanctl command now working and showing the status of strongSwan." width="800" height="540"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It's working but only after still saying &lt;code&gt;Permission denied&lt;/code&gt; at first. I'm not sure why this is happening, but in my experience, refreshing the AppArmor service and its policies a couple of times should make it read the new settings properly. It's fiddly like that.&lt;/p&gt;

&lt;p&gt;This is only a stopgap solution until the upstream developers fix this issue, but compared to not having the AppArmor protection on the process, it's a sufficient solution for now.&lt;/p&gt;

&lt;h3&gt;
  
  
  Note: Vici vs Stroke
&lt;/h3&gt;

&lt;p&gt;I ran into this &lt;code&gt;Permission denied&lt;/code&gt; issue as well when using the older Stroke version of strongSwan. Forcing AppArmor to use the older ABI should work with either version.&lt;/p&gt;

&lt;h2&gt;
  
  
  So, does it work in the end?
&lt;/h2&gt;

&lt;p&gt;To verify that strongSwan is working properly and can establish security associations between my two Proxmox nodes, I deployed a simple test setup.&lt;/p&gt;

&lt;p&gt;I clustered two Proxmox nodes and created a simple VXLAN SDN zone:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs8zlxk49bbyycqmiq06z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs8zlxk49bbyycqmiq06z.png" alt="VXLAN zone configuration in the Proxmox nodes." width="393" height="402"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In both nodes, I set up a simple configuration for strongSwan in &lt;code&gt;/etc/swanctl/swanctl.con&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;connections {
  vxlan {
    proposals = aes128-sha256-modp3072
    remote_addrs = 10.0.0.65 # Set to the other node's IP
    encap = yes
    local {
      auth = psk
    }
    remote {
      auth = psk
    }
    children {
      net-net {
        esp_proposals = aes128-sha256
        remote_ts = 0.0.0.0/0[udp/4789]
        local_ts = 0.0.0.0/0[udp/4789]
        mode = transport
        start_action = start
        updown = /usr/lib/ipsec/_updown iptables
      }
    }
  }
}

secrets {
  ike {
    secret = SECRET
  }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After applying it via &lt;code&gt;swanctl --load-all&lt;/code&gt;, here is the result when running &lt;code&gt;swanctl --list-sas&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxr79e6dop24bpolvfu7v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxr79e6dop24bpolvfu7v.png" alt="Swanctl output listing established IPsec security associations." width="800" height="193"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;StronSwan can now establish IPsec connections between the two nodes without AppArmor denials, and I can keep working on the IPsec configuration further to secure the VXLAN communications.&lt;/p&gt;

&lt;h2&gt;
  
  
  TLDR
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Why is this happening?
&lt;/h3&gt;

&lt;p&gt;AppArmor ABI broke some Unix networking options.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do you fix this?
&lt;/h3&gt;

&lt;p&gt;Switch to the older ABI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;echo &lt;/span&gt;override-policy-abi&lt;span class="o"&gt;=&lt;/span&gt;/etc/apparmor.d/abi/4.0 | &lt;span class="nb"&gt;tee&lt;/span&gt; &lt;span class="nt"&gt;-a&lt;/span&gt; /etc/apparmor/parser.conf

systemctl restart apparmor.service &lt;span class="c"&gt;# (optional) This is to make sure AppArmor is reloaded properly&lt;/span&gt;
apparmor_parser &lt;span class="nt"&gt;--purge-cache&lt;/span&gt; &lt;span class="c"&gt;# Needed on my end because my setup won't refresh swanctl's profile&lt;/span&gt;
apparmor_parser &lt;span class="nt"&gt;-r&lt;/span&gt; /etc/apparmor.d/usr.sbin.swanctl
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Thoughts? Comments? Questions?
&lt;/h3&gt;

&lt;p&gt;Feel free to leave a comment!&lt;/p&gt;

</description>
      <category>linux</category>
      <category>networking</category>
    </item>
    <item>
      <title>Building a Self-hosted IAM Platform to Add SSO to My Home Lab</title>
      <dc:creator>Patima Poochai</dc:creator>
      <pubDate>Mon, 19 May 2025 08:00:51 +0000</pubDate>
      <link>https://dev.to/patimapoochai/building-a-self-hosted-iam-platform-to-add-sso-to-my-home-lab-5a2n</link>
      <guid>https://dev.to/patimapoochai/building-a-self-hosted-iam-platform-to-add-sso-to-my-home-lab-5a2n</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp6dtw1ng87barpf3mwjb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp6dtw1ng87barpf3mwjb.png" alt="diagram of the project showing how the Keycloak and LLDAP maps to the IAM platform architecture" width="800" height="615"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A growing problem with my home lab is password fatigue. Each time I add a new service to my network, I generate a new random password for it. I got by initially with using a different password for every service by storing them in a password manager, but as the number of services exceeded double digits, having to open my password manager every time I want to access each service is starting to hinder my productivity. I needed a self-hosted SSO solution for my home lab services, so I deployed an open-source IAM platform that supports SSO via OIDC, OAuth, and LDAP protocols to my Kubernetes cluster.&lt;/p&gt;

&lt;h2&gt;
  
  
  Walkthrough
&lt;/h2&gt;

&lt;p&gt;The initial design was simple. I only needed SSO for my services, namely Grafana and Syncthing, and my solution must support both modern protocols like OAuth/OIDC and legacy protocols like LDAP. I used &lt;a href="https://youtu.be/5uNifnVlBy4?si=C2xiKW8gnUEgNrNV" rel="noopener noreferrer"&gt;IBM's IAM architecture video&lt;/a&gt; as an inspiration to draft a simple IAM stack using a modern identity management system that integrates with a single, centralized LDAP directory store.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F314h4c6lcn9zm9oua19y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F314h4c6lcn9zm9oua19y.png" alt="my version of the IBM IAM architecture diagram, showing the three layres of an IAM system" width="800" height="1045"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The architecture of an IAM platform is made up of 3 layers: the base infrastructure layer, the application layer, and the connection layer. The base layer is composed of a directory store, a repository for identity information, and synchronization, the ability for multiple directories to share identity information with each other. There were many self-hosted LDAP directory servers available, like the &lt;a href="https://www.port389.org/" rel="noopener noreferrer"&gt;389 Directory Server&lt;/a&gt; and &lt;a href="https://www.freeipa.org/page/Main_Page" rel="noopener noreferrer"&gt;FreeIPA&lt;/a&gt;, but I chose &lt;a href="https://github.com/lldap/lldap" rel="noopener noreferrer"&gt;LLDAP&lt;/a&gt; to be the centralized directory store because of its simple configuration and low resource usage.&lt;/p&gt;

&lt;p&gt;The application layer contains the software that implements IAM workflows like administration, access management, and roles. For this layer, I used &lt;a href="https://www.keycloak.org/" rel="noopener noreferrer"&gt;Keycloak&lt;/a&gt; to provide the functionality of an SSO system, like a user login interface and SSO redirection. The connection layer deals with identity federation across multiple IAM platforms, but because the scope of my project is only to deploy an SSO provider to my home lab network, implementing this layer is unnecessary.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzkdnndr52ehp03ruygbq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzkdnndr52ehp03ruygbq.png" alt="dashboard of LLDAP showing a list of users" width="800" height="348"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I started by setting up a directory store using an LLDAP server. LLDAP is a minimal LDAP server with very basic features, and it can only integrate with a limited range of LDAP services. I deployed the LLDAP server alongside a Keycloak instance using Helm and configured a read-only federation with Keycloak to synchronize the users in its directory with the users from the LLDAP directory. LLDAP also doesn't support modern authentication methods like OAuth and OIDC, so setting up federation allowed users within the LLDAP directory to authenticate with applications that don't support LDAP.&lt;/p&gt;

&lt;p&gt;Federation also allowed me to manage identities across both tools in one centralized directory store, as changes in the LLDAP directory will also be reflected in Keycloak's directory. This configuration created the least management overhead while providing the widest compatibility for both legacy applications using LDAP and modern applications using OAuth and OIDC.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6o9zpqomzl6qzo0wj36i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6o9zpqomzl6qzo0wj36i.png" alt="the keycloak configuration to synchronize it to LLDAP" width="800" height="425"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The IAM platform is pretty much complete, and now I only need to configure my applications to use the platform as the SSO provider. First, Grafana supports OAuth/OIDC authentication, so I set Grafana as a "client" of Keycloak and configured it to redirect any user sign-in requests to Keycloak's authentication page. After the user signs in with their Keycloak account, Grafana will receive the ID and access tokens from Keycloak that contain the user's identity information.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyy6n8xnvoo5shritc2fh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyy6n8xnvoo5shritc2fh.png" alt="the Grafana OAuth/OIDC configuration" width="800" height="550"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Other applications that don't support OAuth/OIDC authentication, like Syncthing, can use LLDAP as the SSO provider directly. It is less secure to configure SSO authentication with Syncthing, as LLDAP uses the admin account directly to query the identity information for each login.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7oknx5f8o09vpbkzo3od.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7oknx5f8o09vpbkzo3od.png" alt="the Syncthing LDAP configuration" width="800" height="409"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;This project was a great introduction to the IAM architecture and its security protocols, and it will help me improve my future projects. There are certain side projects that are too small to use an enterprise IAM solution, while also being too big to not implement some form of user authentication. One project that comes to my mind is the &lt;a href="https://github.com/Oliveriver/5d-diplomacy-with-multiverse-time-travel" rel="noopener noreferrer"&gt;5D Diplomacy With Multiverse Time Travel&lt;/a&gt; game. It's a web game that was released as a self-hosted project without user authentication initially, and it created a huge barrier to entry for non-technical players who'd rather have a public instance of the game where they can quickly try out the game.&lt;/p&gt;

&lt;p&gt;Projects like this would have lowered the barrier to entry and gained a lot of visibility by having an IAM system, yet it's a huge pain to write your own authentication code. Learning how to use off-the-shelf IAM tools like Keycloak and LLDAP saves you a lot of time from having to reinvent the wheel while also using industry-standard protocols that enable you to migrate your projects to an enterprise IAM solution in the future.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbdwij8lzmx0oi3zzlvf5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbdwij8lzmx0oi3zzlvf5.png" alt="image of the 5d diplomacy login page" width="800" height="537"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;View my project on &lt;a href="https://github.com/patimapoochai/self-hosted-iam" rel="noopener noreferrer"&gt;Github&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>What I've Learned From Troubleshooting OIDC/OAuth SSO Errors in Grafana and Keycloak</title>
      <dc:creator>Patima Poochai</dc:creator>
      <pubDate>Wed, 07 May 2025 06:38:41 +0000</pubDate>
      <link>https://dev.to/patimapoochai/a-fun-afternoon-troubleshooting-oidcoauth-sso-errors-in-grafana-and-keycloak-2mb5</link>
      <guid>https://dev.to/patimapoochai/a-fun-afternoon-troubleshooting-oidcoauth-sso-errors-in-grafana-and-keycloak-2mb5</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj3uxi6031xqrniblrbfh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj3uxi6031xqrniblrbfh.png" alt="Diagram of the process to troubleshoot OIDC/OAuth SSO error by performing diagnostics on the browser, Grafana, and Keycloak." width="800" height="744"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I'm working on adding a simple IAM stack to &lt;a href="https://dev.to/patimapoochai/building-a-declarative-home-lab-using-k3s-ansible-helm-on-nixos-and-rocky-linux-21l5"&gt;my home lab&lt;/a&gt; to enable SSO for my services like Grafana. I configured the Grafana instance to allow for SSO authentication using Keycloak as the provider and OIDC/OAuth as the protocols. When I clicked the SSO option on the Grafana login page, however, I got an error after it redirected the browser to Keycloak's sign-in page.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3f5rliu3p1f30mz4u524.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3f5rliu3p1f30mz4u524.png" alt="Keycloak login page showing an error message saying that the " width="800" height="424"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The page said that the &lt;code&gt;redirect_uri&lt;/code&gt; was invalid, and online forums described this error as being caused by the Valid redirect URI list in Keycloak not matching the URL of the Grafana instance. The instance is being accessed on &lt;code&gt;http://nextcloud.home.internal:30007&lt;/code&gt;, and when I check the redirection URI list on Keycloak, this URL is included in the list. If this isn't the cause of this error, then what is?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbckd5t8eeocaxh6zb11l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbckd5t8eeocaxh6zb11l.png" alt="a picture of the URL of the Grafana OAuth configuration page with the URL of the instance shown" width="800" height="457"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8r9lty3h3w2h9q7bzvg3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8r9lty3h3w2h9q7bzvg3.png" alt="Keycloak settings page showing the Valid Redirection URI list containing the URL of the Grafana endpoint" width="800" height="457"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Maybe there's a clue in the browser. I tried checking the console output after clicking the SSO login button to see if there was something wrong with the login flow in the front end. I found that the URL of the request included a query string &lt;code&gt;redirect_uri&lt;/code&gt; that contains the source of the redirection (which should be set to the URL of the Grafana instance), and it was set to &lt;code&gt;http://localhost:3000&lt;/code&gt;. The source of the request obviously doesn't match the valid redirection URIs that were set in Keycloak, but I don't see anything in the Grafana OAuth settings that would make the redirection use this URL.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb8rsxtdt9uqrd3yv92v5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb8rsxtdt9uqrd3yv92v5.png" alt="The Keycloak error page with the URL showing that it's being redirected from http://localhost:3000" width="800" height="377"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Maybe I should look for a setting that is set to &lt;code&gt;http://localhost:3000&lt;/code&gt; somewhere outside of the OAuth configuration page. I browsed all the possible settings within Grafana and found something in the &lt;code&gt;General&lt;/code&gt; settings.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjqisj93f4bqg3z4250bt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjqisj93f4bqg3z4250bt.png" alt="The general settings page of Grafana, with " width="800" height="437"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;domain&lt;/code&gt; and &lt;code&gt;http_port&lt;/code&gt; settings in this page has a similar value to  the &lt;code&gt;redirect_uri&lt;/code&gt; query string in the error, and both of these values are combined inside the template of the &lt;code&gt;root_url&lt;/code&gt; setting.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Jackpot&lt;/em&gt;. It seems like when Grafana redirects a user to an identity provider's SSO page, the value inside &lt;code&gt;root_url&lt;/code&gt; will be used as the &lt;code&gt;redirect_uri&lt;/code&gt; query string. Therefore, this value must match the publically accessible URL of the Grafana instance and be included in Keycloak's valid redirection URI list. Otherwise, the user will get a &lt;code&gt;invalid parameter: redirect_uri&lt;/code&gt; error.&lt;/p&gt;

&lt;p&gt;The fix is simple. I modified Grafana's configuration file called &lt;code&gt;defaults.ini&lt;/code&gt; to change the &lt;code&gt;domain&lt;/code&gt; and &lt;code&gt;http_port&lt;/code&gt; values so that, when they are combined in the &lt;code&gt;root_url&lt;/code&gt; value, they match the public URL of my Grafana instances at &lt;code&gt;http://nextcloud.home.internal:30007&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7u0rre39qhqcifs0qcfv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7u0rre39qhqcifs0qcfv.png" alt="The defaults.ini file with the correct values" width="800" height="295"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The Grafana instance was running in a pod set to expose port &lt;code&gt;3000&lt;/code&gt; initially, and I found that I had to change the pod to expose port &lt;code&gt;30007&lt;/code&gt; instead to allow Grafana to work properly. Grafana will run the service inside the container on the same port number as the one set in the &lt;code&gt;http_port&lt;/code&gt; setting, so you have to adjust the port that the pod will expose accordingly.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqp1mlmkmt816tbhmf8rv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqp1mlmkmt816tbhmf8rv.png" alt="The configuration of the Grafana instance's Kubernetes service which is set to expose port 30007 on the container" width="800" height="250"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After a quick restart of the Grafana deployment, I tried using the OAuth SSO option when logging into Grafana again. And voila!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh5mx4wiuweyivtfr3jwb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh5mx4wiuweyivtfr3jwb.png" alt="Keycloak SSO login page working correctly" width="800" height="550"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I'm able to use the Keycloak login page to authenticate with Grafana. I could then log in to Grafana using the credentials of a user stored in the Keycloak database.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe356c1yo747tkf5xr4o5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe356c1yo747tkf5xr4o5.png" alt="Grafana dashboard showing that the SSO login was successful and the browser is logged into " width="800" height="348"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Because this user is stored in the Keycloak database, I could then centrally manage this user in the Keycloak administrator GUI.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnd2wfd7qw6czd4nhtgfj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnd2wfd7qw6czd4nhtgfj.png" alt="Keycloak management interface showing that " width="800" height="299"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It was a surprise to learn that the &lt;code&gt;root_url&lt;/code&gt; value is used as the redirection URL for Grafana because this fact wasn't mentioned in the Grafana OIDC/OAuth documentation. However, this was still an interesting experience that gave me new insights into how Grafana manages OIDC/OAuth SSO redirection.&lt;/p&gt;

</description>
      <category>devops</category>
      <category>opensource</category>
      <category>security</category>
      <category>iam</category>
    </item>
    <item>
      <title>Building a declarative home lab using K3s, Ansible, Helm on NixOS and Rocky Linux</title>
      <dc:creator>Patima Poochai</dc:creator>
      <pubDate>Mon, 31 Mar 2025 01:32:01 +0000</pubDate>
      <link>https://dev.to/patimapoochai/building-a-declarative-home-lab-using-k3s-ansible-helm-on-nixos-and-rocky-linux-21l5</link>
      <guid>https://dev.to/patimapoochai/building-a-declarative-home-lab-using-k3s-ansible-helm-on-nixos-and-rocky-linux-21l5</guid>
      <description>&lt;p&gt;I set up a home lab to run services like Prometheus, Grafana, and more on K3s and Docker. I deployed these services using Helm and configured the operating system with Ansible. These services are running on two Beelink mini PCs using NixOS and Rocky Linux for the operating system layer.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjvqibqim1g5f3ygw5c2t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjvqibqim1g5f3ygw5c2t.png" alt="high-level diagram of the setup showing the network map, the two hosts and their OS, the boundary where k3s and docker services lie, and the services namely Prometheus and Grafana" width="547" height="842"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F26fc79m2n6dmoklxigop.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F26fc79m2n6dmoklxigop.jpg" alt="picture of the front of the ZimaBoard, switch, and the two Beelinks" width="800" height="434"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Core Features
&lt;/h2&gt;

&lt;p&gt;My home lab provides the following functionality:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Observability with system health and performance monitoring from &lt;strong&gt;Prometheus&lt;/strong&gt; and dashboard visualization from &lt;strong&gt;Grafana&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Configuration management and automation using &lt;strong&gt;Ansible&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Containerized services orchestrated with the &lt;strong&gt;K3s&lt;/strong&gt; flavor of Kubernetes&lt;/li&gt;
&lt;li&gt;Declarative containers deployment using &lt;strong&gt;Helm&lt;/strong&gt; and &lt;strong&gt;Docker Compose&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Declarative operating system configuration with &lt;strong&gt;NixOS&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Version control system hosting local code repositories on &lt;strong&gt;Forgejo&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;File storage server with &lt;strong&gt;Nextcloud&lt;/strong&gt; and file synchronization with &lt;strong&gt;Syncthing&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Self-hosted application launcher using &lt;strong&gt;Heimdall&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Pain of the Manual
&lt;/h2&gt;

&lt;p&gt;I love working on infrastructure deployment and operations. Before this project, I used to tinker with smaller labs using Linux servers like Proxmox and Ubuntu. I also love learning about new open-source, self-hosted services that could boost my productivity and expand my skill set.&lt;/p&gt;

&lt;p&gt;This time, however, I had one new objective: to have a home lab that is fun to manage. There was a lot of pain with my past labs due to how my devices were configured manually. When something (inevitably) goes wrong, I often don't have the time to dig through the logs, trace each step, and figure out what happened. I'd rather reset everything to have the service be online again; prioritizing availability.&lt;/p&gt;

&lt;p&gt;However, this preference came with a big cost because I'd have to configure everything by hand again. It was discouraging to retrace my steps and relearn how to apply the same settings over and over again. It also limited the scope of my home lab because the risk of having to reapply the same configurations increases the more I add new devices and experiment with new services.&lt;/p&gt;

&lt;p&gt;So I came up with a plan: I'm going to &lt;strong&gt;use as much X-as-code technology as I can&lt;/strong&gt; for this project. I wanted to use Kubernetes. I wanted to codify as much hands-on Linux configuration into Ansible as possible. I even went as far as planning to use NixOS, an experimental Linux distribution where &lt;em&gt;every&lt;/em&gt; system configuration is in code. This goal became a major influence on how I chose the technology I used for this project.&lt;/p&gt;

&lt;h2&gt;
  
  
  Technologies Used
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Observability tools&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://prometheus.io/" rel="noopener noreferrer"&gt;Prometheus&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://grafana.com/" rel="noopener noreferrer"&gt;Grafana&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Deployment and configuration management tools&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://helm.sh/" rel="noopener noreferrer"&gt;Helm&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.ansible.com/ansible/latest/index.html" rel="noopener noreferrer"&gt;Ansible&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Containerization and orchestration&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Kubernetes with &lt;a href="https://k3s.io/" rel="noopener noreferrer"&gt;K3s&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.docker.com/compose/" rel="noopener noreferrer"&gt;Docker Compose&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Operating system layer&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://nixos.org/" rel="noopener noreferrer"&gt;NixOS&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://rockylinux.org/" rel="noopener noreferrer"&gt;Rocky Linux&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnhwb0jyp7c9xy98yqgfs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnhwb0jyp7c9xy98yqgfs.png" alt="diagram of each layer of the core features and the tools that are used for each layer" width="480" height="481"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Why I chose these technologies
&lt;/h3&gt;

&lt;p&gt;This project got started when I found the &lt;a href="https://www.amazon.com/Beelink-Lake-N150-Upgraded-Computer-Business/dp/B0DP2SGVVY?crid=36HCIJKITJN2B&amp;amp;dib=eyJ2IjoiMSJ9.aVqwMZh9Fvu-6xw_P6_JgovLN5sln2oOcTSUjwQQiUTC_qiDysVZ3qucHMVfcgIp8Gqw6x697gJihPdfsCQvF2SwYgO2lFPD3oEzWczH2dkvy-vGkEjSyAvevFtczarBjSq05tzfvaCdIMcrV6oJJewATWlaPuPg4ptQxul5Eg60TnrntatfIKSFT9e1uMHLuIre8oIzYF9gwUG9YvZN9g.oC5r2_G7cWkC2fWlxWQ5ZL9gW1JFlEquuRhtDNZXPds&amp;amp;dib_tag=se&amp;amp;keywords=beelink+mini+pc&amp;amp;qid=1743207996&amp;amp;sprefix=beelink+mini+p%2Caps%2C422&amp;amp;sr=8-2" rel="noopener noreferrer"&gt;Beelink mini PCs&lt;/a&gt; (not sponsored) while browsing Amazon. I've seen some people recommending these mini PCs due to their low power consumption, decent performance, and relatively cheap price ($139 at the time of writing). They would be the perfect bare-metal layer for my lab environment as I can buy and set up as many of them as I want without breaking the bank.&lt;/p&gt;

&lt;p&gt;I chose to use Rocky Linux as the OS for my first machine, as it's a free version of Red Hat's RHEL, and I'm the most familiar with RHEL-based distributions from the time spent studying for the RHCSA certification. The second machine will run NixOS as a way for me to get started with learning the declarative approach to Linux server configuration. NixOS is one of the most declarative ways to run your infrastructure, so it's the perfect tool to make as much of my lab into code as possible.&lt;/p&gt;

&lt;p&gt;The decision to use mini PCs as the host machines heavily affected the tool I chose for the container runtime and orchestration layer. As each machine has a decent, but limited performance, K3s is the only choice to run a Kubernetes cluster because it has a low resources requirement. I didn't consider using Helm with K3s at first, but I started using it midway through the project because it was getting too difficult to manage the Kubernetes manifests for my services.&lt;/p&gt;

&lt;p&gt;However, I still wanted a few services to be running on Docker using Docker Compose. These services are essential for my day-to-day productivity, and I wanted to them be accessible even if K3s is down.&lt;/p&gt;

&lt;p&gt;For transcribing manual Linux configuration into code, I turned to Ansible because it's compatible with both Rocky Linux and NixOS while having an easy-to-learn syntax. It also doesn't require an agent to be installed on each host, so I can provision new machines without setting up an OS imaging pipeline.&lt;/p&gt;

&lt;p&gt;By the end of the project, I got interested in adding a monitoring stack to my lab from listening to The Pragmatic Engineer's podcast episode on &lt;a href="https://newsletter.pragmaticengineer.com/p/observability-the-present-and-future" rel="noopener noreferrer"&gt;Observability with Charity Majors&lt;/a&gt;. I wanted to take the simplest path to provide observability, so I chose the Prometheus + Grafana stack.&lt;/p&gt;

&lt;h2&gt;
  
  
  Highlights
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Wi-Fi bridge networking
&lt;/h3&gt;

&lt;p&gt;First, I started with the networking. I had an issue where there were no direct ethernet ports that could connect my home lab to the internet, so I set up a Wi-Fi extender as a bridge and used its ethernet output to create a nested network within my home. The traffic between the devices in the lab will be routed internally, while the internet-bound traffic will be forwarded over the Wi-Fi extender. View more details about this process in &lt;a href="https://dev.to/patimapoochai/how-to-run-a-home-lab-without-an-ethernet-port-220j"&gt;my blog post&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp1m4b4ha9feht8k5qfnm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp1m4b4ha9feht8k5qfnm.png" alt="image the network map with two networks with one of them nested inside the other" width="712" height="431"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Configuration management and automation
&lt;/h3&gt;

&lt;p&gt;I then installed Rocky Linux on my first machine and forced myself to only use Ansible to manage the configuration of the first machine.&lt;/p&gt;

&lt;p&gt;I wrote an Ansible playbook to set up the host to be managed by Ansible. Some of the tasks that I wrote into the playbook are: creating an SSH key pair, uploading the key to the host, and setting up privilege escalation for the Ansible user account. I also wrote a playbook to optimize the system's power consumption by using the &lt;code&gt;tuned&lt;/code&gt; package to apply battery-saving settings to the system.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcx7pbbstoot4747haeeo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcx7pbbstoot4747haeeo.png" alt="snippet of the ansible code to apply power-saving settings to the host machine" width="723" height="244"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It took quite a while to write the playbooks to get the machine ready for running container services. However, it was worth the effort because if the machine breaks and needs a reset, I can just rerun the Ansible playbook again, and it will be up and running in no time. I don't have to do anything by hand as all of the configuration to prepare the system for running services is defined in the configuration management tool.&lt;/p&gt;

&lt;h3&gt;
  
  
  Productivity and essential services
&lt;/h3&gt;

&lt;p&gt;There were a few services that were essential for my productivity, so I used Docker Compose to deploy them. I wanted services like Forgejo, Nextcloud, and Syncthing to be available at all times even if K3s is unavailable. I particularly have the least tolerance for downtime with the Forgejo service, as it will be hosting Kubernetes manifests and Ansible playbooks that are required to deploy everything else in the future.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpl7n9632bhg8ham82qqs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpl7n9632bhg8ham82qqs.png" alt="image of a Forgejo future ansible and helm repository" width="800" height="377"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I wrote Ansible playbooks to automate the deployment of the Docker Compose services to the Rocky Linux host. Some configurations can't be set within the Docker Compose files, like adding firewall rules to the host to open the necessary ports for each service, so those settings are applied by the playbook. This extra automation layer makes the whole process declarative and written as code.&lt;/p&gt;

&lt;h3&gt;
  
  
  Container orchestration
&lt;/h3&gt;

&lt;p&gt;Ansible was also used to deploy K3s on the Rocky Linux host as the container orchestration layer. While the installation process for K3s is very simple (just running a shell script) the software required a few special firewall rules to classify the traffic originating from the container network as "trusted." Adding these rules only required a few commands, but I still worked them into a playbook. The time it took to run a few commands manually will add up fast when you have to set up multiple nodes in the future.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmpflt72kynnd4chaen32.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmpflt72kynnd4chaen32.png" alt="image of the configuration to allow traffic coming from the container CIDR" width="800" height="431"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Declarative containers
&lt;/h3&gt;

&lt;p&gt;I didn't plan to use Helm initially. Helm wasn't included in the curriculum for CKA when I took it, so I assumed that it wasn't needed if you're doing SysOps work (like setting up a home lab infrastructure). I also thought that it would be difficult to learn and that the time saved from using this tool would be too small. However, it was getting more difficult to manage the ever-growing number of Kubernetes manifests, so I decided to pick up Helm. It turned out to be a simple tool to learn, and it made the deployment process faster by bundling all of the manifests into one package.&lt;/p&gt;

&lt;p&gt;The first service I deployed with Helm was Heimdall, and it provides a central place to host the shortcuts to all of the self-hosted services in my lab.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmlc2wbs90ezmprqobo1h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmlc2wbs90ezmprqobo1h.png" alt="image of a successful Helm deployment with the shortcuts to each service" width="800" height="453"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Metrics, monitoring, and visualization
&lt;/h3&gt;

&lt;p&gt;Following the discovery of Helm, I wrote the service manifests for the Prometheus and Grafanna services in Helm and used Ansible to automate their deployment to my K3s cluster. I also wrote a playbook to deploy the Prometheus node exporter's binary, upload the configuration files, and open the firewall ports for metrics collection.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fymt8gzui12ekowkm10l2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fymt8gzui12ekowkm10l2.png" alt="image of a running Prometheus server" width="800" height="255"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Using the data collected from the node exporter, I could visualize each system metric in a dashboard by connecting Grafana to the Prometheus server.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fypw9q2jpwq8kgqx8xhxq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fypw9q2jpwq8kgqx8xhxq.png" alt="image of the Grafana dashboard" width="800" height="327"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Troubleshooting Prometheus
&lt;/h3&gt;

&lt;p&gt;While working on deploying Prometheus, I ran into an &lt;code&gt;open /prometheus/queries.active: permission denied&lt;/code&gt; issue. After a few days of troubleshooting, I found that Prometheus required specific permission to be set on the host directory if it's being bind-mounted to the container. I fixed this issue by adding a task to the Prometheus deployment playbook to change the owner of the host directory to the same user as the one in the container. View how I troubleshoot and fix this issue on &lt;a href="https://dev.to/patimapoochai/how-to-fix-prometheus-open-prometheusqueriesactive-permission-denied-on-kubernetes-22gf"&gt;my blog post&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4wvm2fydsdh1g971u1go.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4wvm2fydsdh1g971u1go.png" alt="image of the bind mount issue, showing the permission denied error" width="800" height="205"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I also applied this lesson when writing the Helm chart for Grafana by setting the host directory's owner to have the same UID as the one defined in Grafana's &lt;a href="https://github.com/grafana/grafana/blob/28b142e9513f587c4be62801794a8609037adbe8/Dockerfile#L138" rel="noopener noreferrer"&gt;Dockerfile&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmxo3df53qxtyfdc0x53w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmxo3df53qxtyfdc0x53w.png" alt="image of Grafana's docker files" width="272" height="73"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Declarative OS configuration
&lt;/h3&gt;

&lt;p&gt;While I had most of the services that I wanted running on the Rocky Linux host, I also wanted to learn declarative Linux configuration, so I provisioned another Beelink mini PC with NixOS. With this OS, you don't need to install extra configuration management tools as the operating system is already declarative, allowing you to configure every system setting from a single code file.&lt;/p&gt;

&lt;p&gt;I wrote the NixOS configuration files to do the following: configure the host to be managed by Ansible, install the K3s as an agent with a dynamic IP address resolution for the K3s server node, install the Prometheus exporter, and expose the exporter's metrics endpoint to other nodes. I then wrote an Ansible playbook to push these files to the NixOS host, test the configuration, and rebuild the operating system from the configuration.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzlypmickjyyev2u13cy3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzlypmickjyyev2u13cy3.png" alt="image of Ansible playbook for setting up NixOS showing the dynamic IP resolution" width="800" height="136"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Using NixOS was a huge time saver. I estimated that it took me about a week of spending time after work and weekends to set up Ansible management, K3s, and Prometheus exporter for the Rocky Linux host. With NixOS, it took about 3 hours to set up the same three configurations.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftna9zng4ldrjyy9h5lhv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftna9zng4ldrjyy9h5lhv.png" alt="snippet of the nixos code to install and expose Prometheus node exporter" width="800" height="575"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;While learning how to set up the host for Ansible management, I had a hard time understanding how the NixOS sudoers file worked as the documentation was sparse. After taking a few days to learn how sudo is implemented in NixOS, I compiled my notes and published them as a guide on dev.to. View my guide &lt;a href="https://dev.to/patimapoochai/how-to-edit-the-sudoers-file-in-nixos-with-examples-4k34"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lessons Learned
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Reflecting on the tool choices
&lt;/h3&gt;

&lt;p&gt;The Prometheus + Grafana stack is a good introduction to observability, and I want to dive deeper into this topic by replacing Prometheus with &lt;a href="https://opentelemetry.io/" rel="noopener noreferrer"&gt;OpenTelemetry&lt;/a&gt; sometime in the future.&lt;/p&gt;

&lt;p&gt;Choosing K3s and Helm (later on) was a great tooling choice. I love how K3s make the process of getting a Kubernetes cluster up and running very easy. Helm has also become an essential tool for deploying services with Kubernetes, and I'm hoping to learn how to deploy services using a Helm repository in the future.&lt;/p&gt;

&lt;p&gt;Ansible was also a great tool for deploying and managing my servers, but I also wished it was better. The tool was a step above writing bash scripts by providing idempotency and a large library of ready-to-use modules. However, I can't easily undo the changes after they were applied. If I make a mistake in my code, I can't just delete the offending line and rerun the playbook like how you would fix the same mistake in a modern DevOps tool like &lt;a href="https://www.terraform.io/" rel="noopener noreferrer"&gt;Terrafom&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;These issues, however, are non-existent on NixOS. The configuration files behaved like a modern cloud infrastructure tool, where rollback is as simple as removing a line and rebuilding the operating system. Ansible can be added to existing systems to manage them as code, but in NixOS, the system is &lt;em&gt;already&lt;/em&gt; code. I'm looking towards learning &lt;a href="https://nixos.wiki/wiki/flakes" rel="noopener noreferrer"&gt;Nix flakes&lt;/a&gt; in the future to make my configuration even more reproducible across any device.&lt;/p&gt;

&lt;h2&gt;
  
  
  So, still going back to manual configuration?
&lt;/h2&gt;

&lt;p&gt;Going forward, I'm going to double down on always turning manual configurations into code. It just makes everything about building a home lab painless and fun. Have a broken service? You can just re-run the Helm or Docker Compose deployment. The host machine is broken with no obvious solutions? Just wipe it and run the Anisble playbooks to rebuild your machine automatically without having to spend hours setting it up manually again.&lt;/p&gt;

&lt;p&gt;As long as I have the code to rebuild my lab, I won't feel anxious for my setup even if you decide to toss my devices into the ocean. (I would be confused as to why you would do that, but not anxious).&lt;/p&gt;

&lt;p&gt;Check the infrastructure code for this project on &lt;a href="https://github.com/patimapoochai/declarative-homelab-project/tree/main" rel="noopener noreferrer"&gt;Github&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>devops</category>
      <category>linux</category>
      <category>docker</category>
    </item>
    <item>
      <title>How I built a home lab without an ethernet port</title>
      <dc:creator>Patima Poochai</dc:creator>
      <pubDate>Thu, 27 Mar 2025 05:41:55 +0000</pubDate>
      <link>https://dev.to/patimapoochai/how-to-run-a-home-lab-without-an-ethernet-port-220j</link>
      <guid>https://dev.to/patimapoochai/how-to-run-a-home-lab-without-an-ethernet-port-220j</guid>
      <description>&lt;p&gt;In this blog, I will show you how I set up my home lab network without a direct ethernet port (also known as a &lt;a href="https://networkencyclopedia.com/drop/" rel="noopener noreferrer"&gt;network drop&lt;/a&gt;). I used a WiFi range extender as a bridge to a SOHO (small office home office) family router that converts WiFi signals into an ethernet port output. I then connect the WiFi extender to an OPNsense router to provide internet access to the devices without WiFi capabilities while also routing the traffic within the network using a dedicated, wired connection. I also connected another SOHO router to the OPNsense router as an AP (access point) to connect wireless devices to the network.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvxuqg95l6d3zuwvh4ijm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvxuqg95l6d3zuwvh4ijm.png" alt="Diagram of the network showing all of the connections between the SOHO router, the WiFi extender, the OPNsense router, the Zimaboard, the Switch, and the AP" width="657" height="371"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What scenarios would be best suited for this setup?
&lt;/h2&gt;

&lt;p&gt;This setup is great for scenarios where you don't have a direct ethernet connection between your router and your home lab, and you can't install new network drops to connect them. In my case, my house has a SOHO gateway router that is installed several rooms away from my home lab, and I can't make modifications to my living space, like drilling into the ceiling or through a wall, to run an ethernet cable to my setup. Other cases that would benefit from this setup are when you're renting, dorming, or living in a place that restricts modifications to the plenum space or the structure of the space to install new ethernet ports.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fecw0kv4x19v9i265r78y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fecw0kv4x19v9i265r78y.png" alt="Diagram of the floorplan of my home, showing an icon of the router in the living room, and the area of my room that contains my home lab equipment" width="741" height="651"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As the demarcation point for my house is located near the front of the building, the SOHO router that provides the WiFi connection for my household is very far from where my devices are located, and the only way I could get a wired connection between them is by running the cable through the ceiling.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I chose to set up another router
&lt;/h2&gt;

&lt;p&gt;While I can still run my home lab devices off of the SOHO router, there are a few reasons why I wanted a better setup:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Some home lab devices don't have a good WiFi antenna&lt;/strong&gt; - Some of my devices don't have a powerful enough WiFi card, and as my router is very far away, they can't get a stable network connection. Some home lab equipment that I plan to install in the future doesn't even have WiFi capabilities (like the &lt;a href="https://libre.computer/products/aml-s905x-cc/" rel="noopener noreferrer"&gt;AML-S905X-CC&lt;/a&gt;).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Network-heavy tasks will flood the WiFi network&lt;/strong&gt; - Some of my services generate a large amount of network traffic, and it can create congestion on the WiFi network and slow down the internet connection. Since the SOHO router is used by everyone in my family, I can't have my network-heavy tasks running on it. The limited bandwidth of WiFi can also create a bottleneck for high-bandwidth network tasks like NAS (Network Attached Storage) servers and media streaming services.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I wanted to set up a dedicated WiFi extender that consolidates many weak WiFi signals into a single, high-performance connection to my router. I also wanted to keep network-heavy traffic off of the WiFi network and route them via a wired connection instead. By creating a setup to convert the WiFi signals into ethernet, I can get the benefit of having a wired connection for my home lab without spending money and time to set up a direct physical connection to my router.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fun176qzz8r3fa0vp0dby.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fun176qzz8r3fa0vp0dby.png" alt="Diagram visualizing how many devices can flood the WiFi network with signals, while a single WiFi extender can fully use the WiFi channel without disruption" width="710" height="297"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you want a stable, wired connection for your home lab devices, you can follow along with the steps I took to implement this setup below.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setup the physical devices
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Gear list
&lt;/h3&gt;

&lt;p&gt;These are the devices I used for this setup:&lt;/p&gt;

&lt;h4&gt;
  
  
  WiFi range extender: &lt;strong&gt;TP-Link re220&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1t2lgshm9f274fufdskm.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1t2lgshm9f274fufdskm.jpg" alt="Image of a TP-link re220 WiFi range extender" width="800" height="1066"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Router computer: &lt;strong&gt;Zimaboard&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhijoo3r0ao293ibog7qt.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhijoo3r0ao293ibog7qt.jpg" alt="Image of a Zimaboard" width="800" height="1066"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Switch: &lt;strong&gt;TP-Link TL-SG108E managed switch&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhhyp2kl6ctv3gu4pz6tf.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhhyp2kl6ctv3gu4pz6tf.jpg" alt="Image of a TL-SG108E managed switch" width="800" height="713"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You don't have to specifically use these devices. You can use anything that's the best fit for your use case, as long as they have the necessary features. For the WiFi extender, the key feature you're looking for is that the device has an ethernet port output. For the router computer, you're looking for a device that is compatible with your desired router OS and has two ethernet ports. The switch doesn't even have to be managed, I went with the TL-SG108E switch because I want to configure VLANs and other network configurations in the future.&lt;/p&gt;

&lt;h3&gt;
  
  
  Setup the WiFi extender
&lt;/h3&gt;

&lt;p&gt;The first step is to set up the WiFi extender to connect it to the SOHO router over WiFi. The individual steps to do so can vary depending on the device you use, so it's best that you follow your device's instructions or manual.&lt;/p&gt;

&lt;p&gt;In my case, I connect my laptop to the WiFi extender via its ethernet port to access the device's web interface. Then, I entered the SSID of my home network and its WiFi password, and the extender can use this information to connect to the SOHO router's WiFi network.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5tf7882l67d8fysev2ki.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5tf7882l67d8fysev2ki.png" alt="Screenshot of the configuration UI of the WiFi extender showing that it's connected to the SOHO router's WiFi network" width="800" height="510"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You should know when the WiFi extender is connected when you connect a computer to the extender's ethernet port, you are able to get internet access from your device. I just ran a quick &lt;code&gt;ping&lt;/code&gt; command and I was able to get replies back.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe2bu0c2ksger4izsor3v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe2bu0c2ksger4izsor3v.png" alt="Terminal snippet of me running the ping command to google.com and getting replies back" width="800" height="260"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After setting up your WiFi extender, here is how your network should look like:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb5htj96x4mxjzdsoklnc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb5htj96x4mxjzdsoklnc.png" alt="Diagram of the network showing the SOHO router and its CIDR, the WiFi extender being on the network, and the WiFi extender having its own IP within the home router's CIDR" width="336" height="355"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Set up the router and the switch
&lt;/h3&gt;

&lt;p&gt;Here is how I set up the other devices:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmou0063vfzjbqkgxkweg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmou0063vfzjbqkgxkweg.png" alt="Diagram of the physical setup of the network. The WiFi extender's ethernet output is plugged into the first interface of the Zimaboard. Then, the second interface of the Zimaboard is plugged into a port on the switch. Other devices for the home lab are plugged into the remaining ports on the switch" width="558" height="341"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There are a few key points to this configuration:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The ethernet cable that is coming from the WiFi extender should be plugged into the WAN port of your router. In my case, the OPNsense router can use any port as the WAN interface, so I plugged it into the &lt;code&gt;re0&lt;/code&gt; interface.&lt;/li&gt;
&lt;li&gt;Then, you can connect the other interface (&lt;code&gt;re1&lt;/code&gt;) on your router to the switch. The router will provide internet access and LAN routing via the switch, so you can connect any home lab devices you have to the remaining ports of the switch.&lt;/li&gt;
&lt;li&gt;Lastly, you should place your WiFi extender as close to your home router as possible and point it in the SOHO router's direction. This will ensure that you get the best possible connection between your home router and your home lab network. For my setup, I mounted my WiFi extender on a gooseneck holder, and I pointed it in the direction of the SOHO router.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1owhy4400juurf2cjzi9.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1owhy4400juurf2cjzi9.jpg" alt="Image of the Zimaboard connected to the switch and the WiFi extender" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5uieuv4f4f6yar6j11uw.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5uieuv4f4f6yar6j11uw.jpg" alt="Image of my WiFi Extender mounted onto a gooseneck holder" width="800" height="1066"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Configure the networking layer
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The general steps
&lt;/h3&gt;

&lt;p&gt;I went with OPNsense as the router operating system for this project, but if you don't want to use OPNsense, you can still follow along with this setup. The workflow might be different, but the general step of how you should configure the router should be the same:   &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Set the interface connected to the WiFi extender as the WAN interface&lt;/li&gt;
&lt;li&gt;Set the interface connected to the switch as the LAN interface&lt;/li&gt;
&lt;li&gt;Make sure a DHCP server is active on the LAN interface&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Configure OPNsense
&lt;/h3&gt;

&lt;h4&gt;
  
  
  1. Install and start OPNsense according to the &lt;a href="https://docs.opnsense.org/manual/install.html" rel="noopener noreferrer"&gt;documentation&lt;/a&gt;
&lt;/h4&gt;

&lt;p&gt;In short, you would flash the installation image onto a USB drive, boot from the USB drive, log into the installation account, and follow the installation wizard.&lt;/p&gt;

&lt;h4&gt;
  
  
  2. Take note of which interface is connected to the WiFi extender and which is connected to the switch
&lt;/h4&gt;

&lt;p&gt;If you recall the physical devices I set up earlier, I connected the WiFi extender to &lt;code&gt;re0&lt;/code&gt; and the switch to &lt;code&gt;re1&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1403at4gvgl5mi6375nx.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1403at4gvgl5mi6375nx.jpg" alt="OPNsense CLI showing the available interfaces" width="800" height="301"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  3. Assign the WAN interface to the port connected to the WiFi extender (&lt;code&gt;re0&lt;/code&gt;) and set the IP address of the WAN interface to DHCP
&lt;/h4&gt;

&lt;p&gt;This will make the OPNsense router take whatever IP the SOHO router assigns to it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffwrn1h6cyoebbmarvebt.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffwrn1h6cyoebbmarvebt.jpg" alt="OPNsense CLI showing the command option to set WAN to DHCP" width="800" height="292"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  4. Assign the LAN interface to the port connected to your switch and use a static IP address for the LAN interface
&lt;/h4&gt;

&lt;p&gt;In OPNsense, setting the static IP address of the LAN interface will also affect the network address you can use for your network. That means you should set the LAN interface's IP to the first, non-network address of your desired CIDR (Classless Inter-Domain Routing) range.&lt;/p&gt;

&lt;p&gt;For example, I want my home lab network to have a CIDR range of 10.0.0.0/24, so I would set the static IP address off the LAN interface in OPNsense to be 10.0.0.1.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0wii9cdhgb2nm4t17i90.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0wii9cdhgb2nm4t17i90.jpg" alt="OPNsense CLI showing the command to assign 10.0.0.1 to the LAN interface" width="800" height="240"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You should also make sure that the IP range on the LAN interface doesn't overlap with the IP range of your SOHO network, as setting them to overlap will cause routing issues on your home lab network. For example, if my SOHO router uses 192.168.1.0/24, I cannot set my home lab router to use 192.168.1.0/24 as well. However, 192.168.2.0/24 will not overlap with the SOHO router.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2txdd0fnr0zisfcpkv9k.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2txdd0fnr0zisfcpkv9k.jpg" alt="OPNsense CLI showing the IP for the interface of WAN, and how it's different than the LAN interface" width="800" height="147"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  5. Set up a DHCP server that will run on the LAN interface.
&lt;/h4&gt;

&lt;p&gt;In OPNsense, you set the lower and upper bound of the IP assignment range. Since my network address is 10.0.0.0/24, the valid client address range would be between 10.0.0.2 and 10.0.0.254.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvh3vssswbl6y1a6yvwqf.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvh3vssswbl6y1a6yvwqf.jpg" alt="OPNsense CLI showing the DHCP server setting the range of client IP addresses" width="800" height="351"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After assigning interfaces and setting up the DHCP server, the WAN interface of the OPNsense router will be assigned an IP address within the SOHO router's WiFi network; making the router appear as just another device on the WiFi. However, the OPNsense router will also serve as the internet gateway for your home lab devices that are wired to the switch. The traffic within the 10.0.0.0/24 network will only pass through the switch, while the internet-bound traffic will be forwarded to the OPNsense router, which will also forward said traffic to your SOHO router over the WiFi connection.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fecprthaioep4v8sa2i5l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fecprthaioep4v8sa2i5l.png" alt="diagram of the two networks, the SOHO and the home lab network, connected by the home lab router and WiFi extender in the middle. An arrow depicts a network packet as it is sent to the home lab router, over WiFi, to the home router, and lastly forwarded to the internet. Each network has its own network range, and the home lab network routes network heavy traffic through the switch without sending out traffic over WiFi" width="711" height="431"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Setup an access point with another SOHO router
&lt;/h2&gt;

&lt;p&gt;We have set up the home lab network to forward wired traffic to the SOHO router over WiFi, but wireless devices cannot connect to the home lab network as is, as the OPNsense doesn't have a WiFi card. We can fix this issue by adding an AP device that is plugged into the switch. It will forward the traffic from your wireless device to the OPNsense router over WiFi, serving as the bridge between your wireless devices and your wired home lab devices.&lt;/p&gt;

&lt;p&gt;Like before, you can use any AP device as long as it has an ethernet port. In my case, I already have a used ASUS RT-AC68U SOHO router. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flf4i3093fri7sfojhcl5.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flf4i3093fri7sfojhcl5.jpg" alt="picture of the Asus router" width="800" height="1066"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I simply connect an ethernet cable from the switch to the WAN port of the ASUS router. The router detects that it's on a network that's already managed by another router, and automatically switches to AP mode. This mode will make the ASUS router act as an AP, meaning that it will only forward requests between the wireless device and your home lab router without creating its own network. If your device doesn't automatically switch into AP mode, you can manually set the router to use this mode using its management UI.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fucwyhiab4h5t3hxg5hi5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fucwyhiab4h5t3hxg5hi5.png" alt="picture of the Asus router configuration interface showing it's in AP mode" width="768" height="313"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The network should now look like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fafvlft2hqm1sl6tw5koa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fafvlft2hqm1sl6tw5koa.png" alt="diagram of the two network with the addition of the AP" width="712" height="431"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  WiFi channels overlapping issue
&lt;/h3&gt;

&lt;p&gt;There might be an issue from the close proximity between the WiFi extender and your AP, as the WiFi signal emitted by both devices can overlap with each other, causing signal collisions and slowing down the connection speed in your network.&lt;/p&gt;

&lt;p&gt;I fixed this issue by setting the AP to use a different WiFi channel frequency than the WiFi extender. WiFi devices can receive and transmit signals within certain ranges of channels, and you can prevent signal collisions by setting each device to use a non-overlapping channel.&lt;/p&gt;

&lt;p&gt;Using &lt;a href="https://f-droid.org/packages/com.vrem.wifianalyzer/" rel="noopener noreferrer"&gt;WiFiAnalyzer from F-droid&lt;/a&gt;, I saw that the WiFi extender is using channels 34-50 on 5GHZ frequency and channels 1-3 on 2.4GHZ frequency. If my AP also uses the same range of channels, it would have to compete with the signals from the WiFi extender and slow down the network.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbdmft04gsgfdt0x34r0g.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbdmft04gsgfdt0x34r0g.jpg" alt="picture of the WiFi analyzer showing that the WiFi extender is running in a specific range on 5GHZ frequency" width="800" height="1650"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiuk1e7480fq3unfnbptr.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiuk1e7480fq3unfnbptr.jpg" alt="picture of the WiFi analyzer showing that the WiFi extender is running in a specific range on 2.4GHZ frequency" width="800" height="1642"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;From the WiFi Analyzer, it seems like channels 163-167 on 5GHZ and channels 9-11 on 2.4GHZ are the least saturated, so I set the AP to use those channels on 5GHZ and 2.4GHZ. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwj6h2b0u9jvgqc3hd94d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwj6h2b0u9jvgqc3hd94d.png" alt="picture of the Asus router configuration interface showing that the interfaces are set to certain channels on 5GHZ frequency" width="800" height="373"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F34nxuwr020tp4ysecw4y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F34nxuwr020tp4ysecw4y.png" alt="picture of the Asus router configuration interface showing that the interfaces are set to certain channels on 2.4GHZ frequency" width="800" height="373"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;With this change, both devices should be using different channels, allowing them to work in harmony despite the close proximity.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F84fmqh2parvtt47pp1ki.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F84fmqh2parvtt47pp1ki.jpg" alt="picture of WiFi analyzer showing that AP router is using a different channel than the WiFi EXTENDER on 5GHZ frequency" width="800" height="1650"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo1pkpce7cpevsaisl490.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo1pkpce7cpevsaisl490.jpg" alt="picture of WiFi analyzer showing that AP router is using a different channel than the WiFi EXTENDER on 2.4GHZ frequency" width="800" height="1651"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Takeaways
&lt;/h3&gt;

&lt;p&gt;By adding a few pieces of networking equipment, you can get an internet connection over WiFi while still providing a wired connection between devices and keeping network-heavy traffic off of the air.&lt;/p&gt;

&lt;p&gt;You can connect a WiFi extender to the SOHO network, and convert that WiFi signal into an ethernet port output. With this port, we can then set up a home lab network with OPNsense that provides a fast wired connection between devices in your home lab network.&lt;/p&gt;

&lt;p&gt;If you need to connect wireless devices to your home lab network, you can set up an AP that forwards the traffic to your switch. We can then minimize the disruption from the close proximity between the AP and the WiFi extender by configuring both devices to use different WiFi channels.&lt;/p&gt;

&lt;h3&gt;
  
  
  Result
&lt;/h3&gt;

&lt;p&gt;After finishing this setup, here is how the internet speed would look like on a device using the SOHO WiFi:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F040cn55x3yb7kj28hrt5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F040cn55x3yb7kj28hrt5.png" alt="a picture of an internet speed test from a computer connected to the home network" width="800" height="353"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;78 Mbps, pretty good.&lt;/p&gt;

&lt;p&gt;Here is my internet speed using the wired connection:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr377w56cq5gihapwz5q6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr377w56cq5gihapwz5q6.png" alt="picture of a speed test from a device that is wired to the switch" width="800" height="371"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;73.92 Mbps on the wire. While it is slower than just using the SOHO router's WiFi directly, all of the internal traffic between home lab devices is kept off of the WiFi, so I'm content with ~5 Mbps drop in speed.&lt;/p&gt;

&lt;p&gt;Here is my internet speed from my wireless device connected to the AP:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2nkeppbjuhdtbjcwqpst.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2nkeppbjuhdtbjcwqpst.jpg" alt="picture of my wireless device's internet speed test result" width="800" height="613"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;54 MB on wireless devices going through the AP. The speed drop and increased latency make sense here, as I'm using an older router and each packet has to hop across two different WiFi connections and three extra network devices.&lt;/p&gt;

&lt;h3&gt;
  
  
  See some improvements?
&lt;/h3&gt;

&lt;p&gt;If you tried this setup, feel free to let me know what your result is. I'm only getting started with the home lab hobby, so if you have any suggestions for improvements, I'd appreciate any feedback in the comment.&lt;/p&gt;

</description>
      <category>networking</category>
      <category>linux</category>
      <category>tutorial</category>
      <category>homelab</category>
    </item>
    <item>
      <title>How to Edit the Sudoers File in NixOS - with Examples</title>
      <dc:creator>Patima Poochai</dc:creator>
      <pubDate>Mon, 24 Mar 2025 00:03:54 +0000</pubDate>
      <link>https://dev.to/patimapoochai/how-to-edit-the-sudoers-file-in-nixos-with-examples-4k34</link>
      <guid>https://dev.to/patimapoochai/how-to-edit-the-sudoers-file-in-nixos-with-examples-4k34</guid>
      <description>&lt;h2&gt;
  
  
  Intro
&lt;/h2&gt;

&lt;p&gt;I wanted to configure the &lt;code&gt;/etc/sudoers&lt;/code&gt; file in NixOS to setup an account that doesn't require password with sudo for Ansible management. However, the &lt;a href="https://wiki.nixos.org/wiki/Sudo" rel="noopener noreferrer"&gt;wiki page for sudo&lt;/a&gt; is a bit lacking, so here's everything I know about managing the sudoers file from trial and error and reading other sources.&lt;/p&gt;

&lt;h2&gt;
  
  
  Basics
&lt;/h2&gt;

&lt;p&gt;If you don't know how to edit the sudo files normally, I recommend you read &lt;a href="https://www.digitalocean.com/community/tutorials/how-to-edit-the-sudoers-file" rel="noopener noreferrer"&gt;DigitalOcean's guide&lt;/a&gt; first.&lt;/p&gt;

&lt;h3&gt;
  
  
  Boilerplate
&lt;/h3&gt;

&lt;p&gt;Here is how you would write the boilerplate to manage the sudoers file. You should put these lines in your &lt;code&gt;configuration.nix&lt;/code&gt; file or in a Nix module that is imported into &lt;code&gt;configuration.nix&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nix"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nv"&gt;config&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;pkgs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt; &lt;span class="p"&gt;}:&lt;/span&gt;

&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nv"&gt;security&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;sudo&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c"&gt;# place top level options (like wheelNeedPassword) here&lt;/span&gt;
        &lt;span class="nv"&gt;enable&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c"&gt;# make sure to enable the sudo package&lt;/span&gt;
        &lt;span class="nv"&gt;execWheelOnly&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="nv"&gt;wheelNeedsPassword&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

        &lt;span class="nv"&gt;extraConfig&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"#includedir /etc/sudoers.d"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c"&gt;# write custom config in here&lt;/span&gt;

        &lt;span class="nv"&gt;extraRules&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="c"&gt;# place sudoers rules here&lt;/span&gt;
        &lt;span class="p"&gt;];&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;

    &lt;span class="c"&gt;# place other configurations outside of the sudo package here&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Key points:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You manage sudoers by setting the configurations inside the &lt;code&gt;security.sudo&lt;/code&gt; module.&lt;/li&gt;
&lt;li&gt;You put all of the sudoers rule in the &lt;code&gt;extraRules&lt;/code&gt; property (there is no defaultRules property).&lt;/li&gt;
&lt;li&gt;You can set other options, like disabling the password prompt for the wheel group, outside of the &lt;code&gt;extraRules&lt;/code&gt; property.&lt;/li&gt;
&lt;li&gt;You can view all possible options for the modules using the man page by running &lt;code&gt;man configuration.nix&lt;/code&gt; and searching for &lt;code&gt;security.sudo&lt;/code&gt;. You can also view the man page online on &lt;a href="https://www.mankier.com/5/configuration.nix" rel="noopener noreferrer"&gt;mankier.com&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  extraRules Template
&lt;/h3&gt;

&lt;p&gt;Here is a template of how you'd write a sudoers rule inside the &lt;code&gt;extraRules&lt;/code&gt; property.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nix"&gt;&lt;code&gt;&lt;span class="nv"&gt;extraRules&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;users&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"sudoers-example"&lt;/span&gt; &lt;span class="p"&gt;];&lt;/span&gt; &lt;span class="c"&gt;# apply this rule to this user&lt;/span&gt;
        &lt;span class="c"&gt;# groups = [ "wheel" ]; # replace the line above with this line to apply the rule to groups&lt;/span&gt;
        &lt;span class="nv"&gt;host&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"ALL"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c"&gt;# host portion of ALL=(ALL:ALL) (i.e. the "ALL=" part), optional&lt;/span&gt;
        &lt;span class="nv"&gt;runAs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"ALL:ALL"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c"&gt;# the "(ALL:ALL)" part in ALL=(ALL:ALL), optional&lt;/span&gt;

        &lt;span class="nv"&gt;commands&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="c"&gt;# takes in a list of commands&lt;/span&gt;
          &lt;span class="s2"&gt;"/run/wrappers/bin/passwd"&lt;/span&gt; &lt;span class="c"&gt;# you can write the commands as only a string&lt;/span&gt;

          &lt;span class="c"&gt;# or write more complex commands uses an attribute set&lt;/span&gt;
          &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nv"&gt;command&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"ALL"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c"&gt;# this would be NOPASSWD: ALL&lt;/span&gt;
            &lt;span class="nv"&gt;options&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"NOPASSWD"&lt;/span&gt; &lt;span class="p"&gt;];&lt;/span&gt; &lt;span class="c"&gt;# don't need the ":" at the end&lt;/span&gt;
          &lt;span class="p"&gt;}&lt;/span&gt; 
        &lt;span class="p"&gt;];&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; 
&lt;span class="p"&gt;];&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Translating from normal configuration to NixOS sudoers
&lt;/h3&gt;

&lt;p&gt;Here is how the normal sudoers rules can be translated into the NixOS configuration.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fukxp76xvdv9larivdh2l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fukxp76xvdv9larivdh2l.png" alt="Colorcoded diagram showing how parts of a normal sudoers rules can be translated into NixOS's configuration language" width="761" height="381"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A few things to note:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;users&lt;/code&gt; field accepts a list of usernames as strings.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;commands&lt;/code&gt; field also accepts a list of commands as strings, and it will transform the list into a single line delimited by commas.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;runAs&lt;/code&gt; field doesn't require a parentheses.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Applying the configuration
&lt;/h3&gt;

&lt;p&gt;After writing the NixOS configuration, there are two ways to apply it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Apply the sudo rules to the system temporarily
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;nixos-rebuild &lt;span class="nb"&gt;test&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Permanently apply the sudo rules to the system
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;nixos-rebuild switch
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You could then test your configuration with these commands&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Change into user account
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;su - USERNAME
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;List the permissions that are assigned to the user with
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo&lt;/span&gt; &lt;span class="nt"&gt;-l&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Common tasks with examples
&lt;/h2&gt;

&lt;p&gt;Here are some common sudoers configurations and how you can write them in NixOS.&lt;/p&gt;

&lt;h3&gt;
  
  
  Make a user become a member of the group wheel (fastest way to give privilege)
&lt;/h3&gt;

&lt;p&gt;First create the &lt;code&gt;sudoers-example&lt;/code&gt; user, equivalent to &lt;code&gt;usermod -aG wheel user&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nix"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nv"&gt;config&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;pkgs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;}:&lt;/span&gt;

&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nv"&gt;users&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;users&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;sudoers-example&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nv"&gt;isNormalUser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nv"&gt;createHome&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nv"&gt;extraGroups&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"wheel"&lt;/span&gt; &lt;span class="p"&gt;];&lt;/span&gt; &lt;span class="c"&gt;# add into wheel&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="o"&gt;...&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then add the wheel group and give it root privileges, equivalent to &lt;code&gt;%wheel ALL=(ALL) ALL&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nix"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="o"&gt;...&lt;/span&gt;
    &lt;span class="nv"&gt;security&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;sudo&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;enable&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

        &lt;span class="nv"&gt;extraRules&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="nv"&gt;groups&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"wheel"&lt;/span&gt; &lt;span class="p"&gt;];&lt;/span&gt;
                &lt;span class="nv"&gt;commands&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"ALL"&lt;/span&gt; &lt;span class="p"&gt;];&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you don't add &lt;code&gt;host = "ALL"&lt;/code&gt; and &lt;code&gt;runAs = "ALL:ALL";&lt;/code&gt;, NixOS will set the &lt;code&gt;host&lt;/code&gt; and &lt;code&gt;runAs&lt;/code&gt; to &lt;code&gt;ALL=(ALL:ALL)&lt;/code&gt; by default.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fastest way to make the wheel group not prompt for a password
&lt;/h3&gt;

&lt;p&gt;The fastest way to make the sudo command work without a password is to assign the user to the &lt;code&gt;wheel&lt;/code&gt; group and set the &lt;code&gt;security.sudo.wheelNeedsPassword&lt;/code&gt; property to true. I found this property from the &lt;a href="https://discourse.nixos.org/t/dont-prompt-a-user-for-the-sudo-password/9163" rel="noopener noreferrer"&gt;NixOS forum&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nix"&gt;&lt;code&gt;&lt;span class="nv"&gt;security&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;sudo&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c"&gt;# remember the top level options?&lt;/span&gt;
    &lt;span class="nv"&gt;wheelNeedsPassword&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Short configuration to allow a user to run all commands as root
&lt;/h3&gt;

&lt;p&gt;Equivalent to &lt;code&gt;sudoers-example ALL=(ALL:ALL) ALL&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nix"&gt;&lt;code&gt;&lt;span class="nv"&gt;extraRules&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;users&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"sudoers-example"&lt;/span&gt; &lt;span class="p"&gt;];&lt;/span&gt;
        &lt;span class="nv"&gt;commands&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"ALL"&lt;/span&gt; &lt;span class="p"&gt;];&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Allow users in certain groups to run all commands as root
&lt;/h3&gt;

&lt;p&gt;This is similar to the above rule, but swap the &lt;code&gt;users&lt;/code&gt; property with the group you want and the commands to what you want. Equivalent to &lt;code&gt;%administrator ALL=(ALL:ALL) ALL&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nix"&gt;&lt;code&gt;&lt;span class="nv"&gt;extraRules&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;groups&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"administrator"&lt;/span&gt; &lt;span class="p"&gt;];&lt;/span&gt;
        &lt;span class="nv"&gt;commands&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"ALL"&lt;/span&gt; &lt;span class="p"&gt;];&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Allow a user to use sudo for a specific list of commands
&lt;/h3&gt;

&lt;p&gt;Equivalent to &lt;code&gt;sudoers-example ALL=/usr/bin/useradd, / usr/bin/passwd&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Caveats:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You cannot use the normal Linux path for commands, like &lt;code&gt;/usr/bin/useradd&lt;/code&gt; for &lt;code&gt;useradd&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;This is because NixOS stores the packages in an alternate location, called the Nix store. You have to use the package's path from said store, and you can't use the usual path, like &lt;code&gt;/usr/bin/passwd&lt;/code&gt;. A quick and dirty workaround for me is to just run &lt;code&gt;which COMMAND&lt;/code&gt; first to get the package's path for NixOS.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nix"&gt;&lt;code&gt;&lt;span class="p"&gt;$&lt;/span&gt; &lt;span class="nv"&gt;which&lt;/span&gt; &lt;span class="nv"&gt;passwd&lt;/span&gt;
&lt;span class="sx"&gt;/run/wrappers/bin/passwd&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nix"&gt;&lt;code&gt;&lt;span class="nv"&gt;extraRules&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt;   
        &lt;span class="nv"&gt;users&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"sudoers-example"&lt;/span&gt; &lt;span class="p"&gt;];&lt;/span&gt;
        &lt;span class="nv"&gt;commands&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt; 
          &lt;span class="p"&gt;{&lt;/span&gt;   
            &lt;span class="nv"&gt;command&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"/run/current-system/sw/bin/useradd"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
          &lt;span class="p"&gt;}&lt;/span&gt;   
          &lt;span class="p"&gt;{&lt;/span&gt;   
            &lt;span class="nv"&gt;command&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"/run/wrappers/bin/passwd"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
          &lt;span class="p"&gt;}&lt;/span&gt;   
        &lt;span class="p"&gt;];&lt;/span&gt;  
      &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Exclude specific commands
&lt;/h3&gt;

&lt;p&gt;This configuration allow the user to change the passwords for all users, but restrict it from changing the root user's password, equivalent to &lt;code&gt;sudoers-example ALL=/usr/bin/passwd, ! /usr/bin/passwd root&lt;/code&gt;. Remember to run &lt;code&gt;which COMMAND&lt;/code&gt; first to find the path of the command.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nix"&gt;&lt;code&gt;&lt;span class="nv"&gt;extraRules&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt;   
        &lt;span class="nv"&gt;users&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"sudoers-example"&lt;/span&gt; &lt;span class="p"&gt;];&lt;/span&gt;
        &lt;span class="nv"&gt;commands&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt; 
          &lt;span class="p"&gt;{&lt;/span&gt;   
            &lt;span class="nv"&gt;command&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"/run/wrappers/bin/passwd"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c"&gt;# you can run passwd on any user&lt;/span&gt;
          &lt;span class="p"&gt;}&lt;/span&gt;   
          &lt;span class="p"&gt;{&lt;/span&gt;   
            &lt;span class="nv"&gt;command&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"! /run/wrappers/bin/passwd root"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c"&gt;# but can't run passwd on root&lt;/span&gt;
          &lt;span class="p"&gt;}&lt;/span&gt;   
        &lt;span class="p"&gt;];&lt;/span&gt;  
      &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Allow a user to run all commands without a password
&lt;/h3&gt;

&lt;p&gt;Equivalent to &lt;code&gt;sudoers-example ALL=(ALL:ALL) NOPASSWD: ALL&lt;/code&gt;. Notice how the tag_spec name (&lt;code&gt;NOPASSWD&lt;/code&gt;) doesn't require an &lt;code&gt;:&lt;/code&gt; at the end.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nix"&gt;&lt;code&gt;&lt;span class="nv"&gt;extraRules&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;   
        &lt;span class="nv"&gt;users&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"sudoers-example"&lt;/span&gt; &lt;span class="p"&gt;];&lt;/span&gt; 
        &lt;span class="nv"&gt;commands&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt; 
          &lt;span class="p"&gt;{&lt;/span&gt;   
            &lt;span class="nv"&gt;command&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"ALL"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="nv"&gt;options&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"NOPASSWD"&lt;/span&gt; &lt;span class="p"&gt;];&lt;/span&gt; &lt;span class="c"&gt;# don't need the ":" at the end &lt;/span&gt;
          &lt;span class="p"&gt;}&lt;/span&gt;   
        &lt;span class="p"&gt;];&lt;/span&gt;  
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Require a password for all commands, but no password for certain commands
&lt;/h3&gt;

&lt;p&gt;Equivalent to &lt;code&gt;sudoers-example ALL=(ALL:ALL) PASSWD: ALL, NOPASSWD: /usr/sbin/modprobe&lt;/code&gt;. The user can need to enter their password for all commands except &lt;code&gt;modprobe&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nix"&gt;&lt;code&gt;&lt;span class="nv"&gt;extraRules&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;   
        &lt;span class="nv"&gt;users&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"sudoers-example"&lt;/span&gt; &lt;span class="p"&gt;];&lt;/span&gt; &lt;span class="c"&gt;# applies the first column of the sudoers line&lt;/span&gt;
        &lt;span class="nv"&gt;commands&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt; 
          &lt;span class="p"&gt;{&lt;/span&gt;   
            &lt;span class="nv"&gt;command&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"ALL"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="nv"&gt;options&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"PASSWD"&lt;/span&gt; &lt;span class="p"&gt;];&lt;/span&gt;
          &lt;span class="p"&gt;}&lt;/span&gt;   
          &lt;span class="p"&gt;{&lt;/span&gt;   
            &lt;span class="nv"&gt;command&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"/run/current-system/sw/bin/modprobe"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c"&gt;# allow loading and unloading of kernel modules&lt;/span&gt;
            &lt;span class="nv"&gt;options&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"NOPASSWD"&lt;/span&gt; &lt;span class="p"&gt;];&lt;/span&gt;
          &lt;span class="p"&gt;}&lt;/span&gt;   
        &lt;span class="p"&gt;];&lt;/span&gt;  
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Prevent commands from spawning subcommands
&lt;/h3&gt;

&lt;p&gt;You can bypass sudo's authorization process by running an allowed command, then triggering the command to spawn a subcommand with the root privileges that was previously blocked by sudo. From the &lt;a href="https://www.digitalocean.com/community/tutorials/how-to-edit-the-sudoers-file" rel="noopener noreferrer"&gt;DigitalOcean article&lt;/a&gt;, for example, you can run &lt;code&gt;less&lt;/code&gt; with sudo but also spawn a bash shell within it that has root privileges.&lt;/p&gt;

&lt;p&gt;You can prevent users from spawning subcommands using the &lt;code&gt;NOEXEC&lt;/code&gt; tag_spec in sudo.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nix"&gt;&lt;code&gt;&lt;span class="nv"&gt;extraRules&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;   
        &lt;span class="nv"&gt;users&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"sudoers-example"&lt;/span&gt; &lt;span class="p"&gt;];&lt;/span&gt;
        &lt;span class="nv"&gt;commands&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt; 
          &lt;span class="p"&gt;{&lt;/span&gt;   
            &lt;span class="nv"&gt;command&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"/run/current-system/sw/bin/less"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="nv"&gt;options&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"NOEXEC"&lt;/span&gt; &lt;span class="p"&gt;];&lt;/span&gt; &lt;span class="c"&gt;# apply a tag_spec that prevent spawning child processes&lt;/span&gt;
          &lt;span class="p"&gt;}&lt;/span&gt;   
        &lt;span class="p"&gt;];&lt;/span&gt;  
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now you can't execute other commands in &lt;code&gt;less&lt;/code&gt; by typing &lt;code&gt;! COMMAND&lt;/code&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Create sudoers aliases for user groups, commands, and run-as
&lt;/h3&gt;

&lt;p&gt;Aliases are a feature of sudo that's similar to a local variable; a single name that refers to a list of items. There's no property in NixOS that can specifically set the &lt;code&gt;User_Alias&lt;/code&gt;, &lt;code&gt;Cmnd_Alias&lt;/code&gt;, &lt;code&gt;Runas_Alias&lt;/code&gt; aliases, but you can use the &lt;code&gt;extraConfig&lt;/code&gt; property to set aliases with custom texts. NixOS will then append the lines from the property into the sudoers file.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nix"&gt;&lt;code&gt;&lt;span class="nv"&gt;security&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;sudo&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nv"&gt;enable&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="nv"&gt;extraConfig&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
    &lt;span class="s2"&gt;''  &lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;&lt;span class="s2"&gt;    User_Alias    ADMINGROUP = sudoers-example # define aliasses here&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;&lt;span class="s2"&gt;    ''&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="nv"&gt;extraRules&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt; 
      &lt;span class="p"&gt;{&lt;/span&gt;   
        &lt;span class="nv"&gt;users&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"ADMINGROUP"&lt;/span&gt; &lt;span class="p"&gt;];&lt;/span&gt; &lt;span class="c"&gt;# will resolve to sudoers-example&lt;/span&gt;
        &lt;span class="nv"&gt;commands&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"ALL"&lt;/span&gt; &lt;span class="p"&gt;];&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Other aliases should work too.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nix"&gt;&lt;code&gt;    &lt;span class="nv"&gt;extraConfig&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
    &lt;span class="s2"&gt;''  &lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;&lt;span class="s2"&gt;    User_Alias    GROUP = user1, user2&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;&lt;span class="s2"&gt;    Cmnd_Alias    KERNEL = /run/current-system/sw/bin/modprobe, /run/current-system/sw/bin/modinfo&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;&lt;span class="s2"&gt;    Runas_Alias   VIRT = kvm&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;&lt;span class="s2"&gt;    ''&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Set other settings of the sudoers file
&lt;/h3&gt;

&lt;p&gt;If you want to add custom configurations that aren't implemented in NixOS's sudo module, you can also use the &lt;code&gt;extraConfig&lt;/code&gt; property. For example, if you want to add &lt;code&gt;/etc/sudoers.d&lt;/code&gt; as a drop-in configuration directory where sudo will search for extra configurations files, then you can add a multi-line string in the format of the normal sudoers configuration language to the &lt;code&gt;extraConfig&lt;/code&gt; property.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nix"&gt;&lt;code&gt;&lt;span class="nv"&gt;security&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;sudo&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nv"&gt;extraConfig&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
    &lt;span class="s2"&gt;''  &lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;&lt;span class="s2"&gt;    #includedir /etc/sudoers.d&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;&lt;span class="s2"&gt;    ''&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Closing Thoughts
&lt;/h2&gt;

&lt;p&gt;Configuring sudo in NixOS might be confusing at first, but you can master the process easily if you practice writing a few sudoers rules and reference the man page of the &lt;code&gt;configuration.nix&lt;/code&gt; file by running &lt;code&gt;man configuration.nix&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;Going back to the purpose of this blog post, we can now write the &lt;code&gt;configuration.nix&lt;/code&gt; file to create a user called &lt;code&gt;ansible&lt;/code&gt; and allow this user to use sudo without asking for the password like so:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nix"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nv"&gt;config&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;pkgs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt; &lt;span class="p"&gt;}:&lt;/span&gt;

&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c"&gt;# create "ansible" user&lt;/span&gt;
  &lt;span class="nv"&gt;users&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;users&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;ansible&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nv"&gt;isNormalUser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nv"&gt;home&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"/home/ansible"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nv"&gt;openssh&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;authorizedKeys&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;keys&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"ssh-rsa PUBLICKEY"&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;

  &lt;span class="c"&gt;# set up sudo to not ask for a password&lt;/span&gt;
  &lt;span class="nv"&gt;security&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;sudo&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; 
    &lt;span class="nv"&gt;enable&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nv"&gt;extraRules&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt; 
      &lt;span class="p"&gt;{&lt;/span&gt;   
        &lt;span class="nv"&gt;users&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"ansible"&lt;/span&gt; &lt;span class="p"&gt;];&lt;/span&gt;
        &lt;span class="nv"&gt;commands&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt; 
          &lt;span class="p"&gt;{&lt;/span&gt;   
            &lt;span class="nv"&gt;command&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"ALL"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="nv"&gt;options&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"NOPASSWD"&lt;/span&gt; &lt;span class="p"&gt;];&lt;/span&gt;
          &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;];&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;];&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>linux</category>
      <category>nixos</category>
      <category>tutorial</category>
      <category>tooling</category>
    </item>
    <item>
      <title>How to fix Prometheus "open /prometheus/queries.active: permission denied" on Kubernetes: step-by-step</title>
      <dc:creator>Patima Poochai</dc:creator>
      <pubDate>Mon, 17 Mar 2025 04:34:08 +0000</pubDate>
      <link>https://dev.to/patimapoochai/how-to-fix-prometheus-open-prometheusqueriesactive-permission-denied-on-kubernetes-22gf</link>
      <guid>https://dev.to/patimapoochai/how-to-fix-prometheus-open-prometheusqueriesactive-permission-denied-on-kubernetes-22gf</guid>
      <description>&lt;p&gt;I learned how to diagnose a new Prometheus + Kubernetes issue today, here's a summary of what I did.&lt;/p&gt;

&lt;h2&gt;
  
  
  Context
&lt;/h2&gt;

&lt;p&gt;I'm trying to install the Prometheus monitoring tool to my k3s cluster using a Helm chart, and I want to store the metrics data in a volume that is mounted to the &lt;code&gt;/prometheus&lt;/code&gt; directory inside the container. I created a volume that is mounted locally to the &lt;code&gt;/home/ansible/prometheus/data&lt;/code&gt; directory on host machine using the &lt;code&gt;rancher.io/local-path&lt;/code&gt; storage class.&lt;/p&gt;

&lt;p&gt;The PersistentVolume and Deployment manifest would look something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PersistentVolume&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;storageClassName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;local-path&lt;/span&gt;
&lt;span class="na"&gt;   local&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="na"&gt;         path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/home/ansible/prometheus/data&lt;/span&gt; &lt;span class="c1"&gt;# directory that is mounted on the host&lt;/span&gt;
&lt;span class="nn"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;apps/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Deployment&lt;/span&gt;
&lt;span class="nn"&gt;...&lt;/span&gt;
  &lt;span class="na"&gt;template&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{{&lt;/span&gt; &lt;span class="nv"&gt;.Values.image.repository&lt;/span&gt; &lt;span class="pi"&gt;}}&lt;/span&gt;&lt;span class="s"&gt;/prometheus:{{ .Chart.AppVersion }}&lt;/span&gt;
        &lt;span class="na"&gt;volumeMounts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;mountPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/prometheus&lt;/span&gt; &lt;span class="c1"&gt;# the volume will be mounted here in the container&lt;/span&gt;
          &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;prometheus-pvc&lt;/span&gt;
&lt;span class="nn"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These manifests createe a volume that stores the data on the host computer running k3s, and mount that volume to the &lt;code&gt;/prometheus&lt;/code&gt; path inside the container.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;The Helm chart installed without an issue, but I get this error when checking the status of the Prometheus pod:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;NAME                        READY   STATUS             RESTARTS       AGE&lt;span class="se"&gt;\&lt;/span&gt;
heimdall-788f5f64c8-mh2lb   1/1     Running            1 &lt;span class="o"&gt;(&lt;/span&gt;4d4h ago&lt;span class="o"&gt;)&lt;/span&gt;   4d4h&lt;span class="se"&gt;\&lt;/span&gt;
prom-tst-6885c4dc8f-kzzmc   0/1     CrashLoopBackOff   4 &lt;span class="o"&gt;(&lt;/span&gt;56s ago&lt;span class="o"&gt;)&lt;/span&gt;    2m19s
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I looked at the logs of the pod:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1b0aqv3jrm4zqqr8bspq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1b0aqv3jrm4zqqr8bspq.png" width="800" height="205"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;What caused the pod to be in the CrashLoopBackOff state was the error: &lt;code&gt;open /prometheus/queries.active: permission denied&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Seems like Prometheus couldn't create the necessary files inside the &lt;code&gt;/prometheus&lt;/code&gt; directory inside the container due to it lacking the necessary permissions. It reminded me of a section from an RHCSA book that explained how you can't bind mount a directory inside the container to a directory on the host if it doesn't have the correct permissions and the correct UID for the user inside the container. Since Prometheus' volume is mounted locally to the host machine, this section of the book might have something to do with why Prometheus is getting this error.&lt;/p&gt;

&lt;p&gt;The book recommended that I run &lt;code&gt;podman unshare&lt;/code&gt; to show the UID of the user inside the container, and set the owner of the directory on the host to have the same UID, but I don't know the equivalent command in Docker to get the container user's UID.&lt;/p&gt;

&lt;p&gt;I tried looking through Prometheus' &lt;a href="https://github.com/prometheus/prometheus/blob/b0227d1f16ea5da448f7a610ed9a7e22e6f35782/Dockerfile#L17" rel="noopener noreferrer"&gt;Dockerfile on Github&lt;/a&gt; to see if I can find the UID of the container user somewhere, but I found something else instead:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwibiej0vwc3ytx7ucvfg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwibiej0vwc3ytx7ucvfg.png" alt="Dockerfile content showing that the user inside the container is called nobody" width="723" height="93"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Looks like the user account of the container is set to &lt;code&gt;nobody&lt;/code&gt;, and it sets the &lt;code&gt;/prometheus&lt;/code&gt; directory inside the container to be owned by &lt;code&gt;nobody&lt;/code&gt;. Maybe the container still expects that directory to still be owned by &lt;code&gt;nobody&lt;/code&gt; when its being used, but who is the owner of that directory on the host machine now?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;[&lt;/span&gt;ansible@nextcloud prometheus]&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;ls&lt;/span&gt; &lt;span class="nt"&gt;-la&lt;/span&gt;
total 4
drwxr-xr-x.  3 ansible ansible   18 Mar 13 19:08 &lt;span class="nb"&gt;.&lt;/span&gt;
drwx------. 13 ansible ansible 4096 Mar 13 19:08 ..
drwxr-xr-x.  4 ansible ansible   70 Mar 13 19:15 data

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Inside the &lt;code&gt;/home/ansible/prometheus&lt;/code&gt; directory, the &lt;code&gt;data&lt;/code&gt; directory is owned by the user &lt;code&gt;ansible&lt;/code&gt;. Just a hunch, but I can try changing the directory owner and group owner on the host machine to be &lt;code&gt;nobody&lt;/code&gt;, and remove the failing pod.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;[&lt;/span&gt;ansible@nextcloud prometheus]&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;sudo chown&lt;/span&gt; &lt;span class="nt"&gt;-R&lt;/span&gt; nobody:nobody data
&lt;span class="o"&gt;[&lt;/span&gt;ansible@nextcloud prometheus]&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;ls&lt;/span&gt; &lt;span class="nt"&gt;-la&lt;/span&gt;
total 4
drwxr-xr-x.  3 ansible ansible   18 Mar 13 19:08 &lt;span class="nb"&gt;.&lt;/span&gt;
drwx------. 13 ansible ansible 4096 Mar 13 19:08 ..
drwxr-xr-x.  4 nobody  nobody    70 Mar 13 19:15 data
&lt;span class="o"&gt;[&lt;/span&gt;ansible@nextcloud prometheus]&lt;span class="err"&gt;$&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;localhost@computer ~/P/Homelab&amp;gt; k get po
NAME                        READY   STATUS             RESTARTS        AGE
heimdall-788f5f64c8-mh2lb   1/1     Running            1 &lt;span class="o"&gt;(&lt;/span&gt;4d4h ago&lt;span class="o"&gt;)&lt;/span&gt;    4d4h
prom-tst-6885c4dc8f-kzzmc   0/1     CrashLoopBackOff   8 &lt;span class="o"&gt;(&lt;/span&gt;4m28s ago&lt;span class="o"&gt;)&lt;/span&gt;   20m
localhost@computer ~/P/Homelab&amp;gt; k delete po prom-tst-6885c4dc8f-kzzmc  
pod &lt;span class="s2"&gt;"prom-tst-6885c4dc8f-kzzmc"&lt;/span&gt; deleted
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And voila, that configuration seems to be what Prometheus needed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;localstoat@thinkpad-e495 ~/P/Homelab&amp;gt; k get po
NAME                        READY   STATUS    RESTARTS       AGE
heimdall-788f5f64c8-mh2lb   1/1     Running   1 &lt;span class="o"&gt;(&lt;/span&gt;4d4h ago&lt;span class="o"&gt;)&lt;/span&gt;   4d4h
prom-tst-6885c4dc8f-hghv7   1/1     Running   0              68s
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And Prometheus is now accessible by accessing the FQDN of the host on the NodePort of the service:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9vzqkgxt7x4wpaqqafwj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9vzqkgxt7x4wpaqqafwj.png" alt="Prometheus web UI is now accessible" width="800" height="277"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Key takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;If you're getting a &lt;code&gt;permission denied&lt;/code&gt; on a Kubernetes deployment that is using a local-path storage class, you have to make sure that &lt;strong&gt;the owner of the directory that is used by the PersistentVolume should be the same as the user inside the container&lt;/strong&gt;, as well as having the necessary permissions. Otherwise, you'll see Prometheus is getting a &lt;code&gt;permission denied&lt;/code&gt; error while trying to write to that directory.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;rancher.io/local-path&lt;/code&gt; is really just using directory bind mount under the hood, and the best practices from Linux administration to make the host machine work correctly with bind mounts can still be applied in Kubernetes.&lt;/li&gt;
&lt;li&gt;You can look inside the service's Dockerfile to get information about the user and directory permissions of the container.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Extra: codifying this troubleshooting into configuration management
&lt;/h2&gt;

&lt;p&gt;To make this whole troubleshooting journey worth it, I modified my Ansible playbook that is used to deploy the Prometheus Helm chart to automatically set the directory owner to be &lt;code&gt;nobody&lt;/code&gt;. If I have to reinstall the Prometheus Helm chart again, my configuration management tool will apply this fix automatically without manual intervention:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Setup prometheus with helm&lt;/span&gt;
&lt;span class="na"&gt; hosts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rocky,&amp;amp;prometheus&lt;/span&gt;
&lt;span class="na"&gt; tasks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="na"&gt; - name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;add directory for bind mount volumes&lt;/span&gt;
&lt;span class="na"&gt;   file&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="na"&gt;     path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/home/ansible/prometheus/data&lt;/span&gt;
&lt;span class="na"&gt;     owner&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nobody&lt;/span&gt; &lt;span class="c1"&gt;# added to set directory owner to be the "nobody" account&lt;/span&gt;
&lt;span class="na"&gt;     group&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nobody&lt;/span&gt; &lt;span class="c1"&gt;# added to set group owner to be the "nobody" account&lt;/span&gt;
&lt;span class="na"&gt;     state&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;directory&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  TLDR
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What caused this
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;The directory used by the PersistentVolume of "rancher.io/local-path" storage class doesn't have the same owner as the user in the Prometheus container.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Why
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;The user in the container is restricted from reading and writing files in the directory on the host machine that stores the data for the local-path volume.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  How to fix it
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Run a &lt;code&gt;chmod&lt;/code&gt; on the host machine to change the user of the directory to be whatever the user inside the container is, in this case it's "nobody".
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo chmod&lt;/span&gt; &lt;span class="nt"&gt;-R&lt;/span&gt; &amp;lt;DirectoryOfTheVolume&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Delete the failing pod (and recreate it if you're not using a Deployment)
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl delete pod &amp;lt;PodName&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>kubernetes</category>
      <category>linux</category>
      <category>tutorial</category>
      <category>devops</category>
    </item>
    <item>
      <title>Creating an AWS + NextJS site for the Cloud Resume Challenge</title>
      <dc:creator>Patima Poochai</dc:creator>
      <pubDate>Tue, 24 Dec 2024 04:02:18 +0000</pubDate>
      <link>https://dev.to/patimapoochai/creating-a-nextjs-aws-site-for-the-cloud-resume-challenge-5121</link>
      <guid>https://dev.to/patimapoochai/creating-a-nextjs-aws-site-for-the-cloud-resume-challenge-5121</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5coe389gvl5g65gwl7od.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5coe389gvl5g65gwl7od.png" alt="Title card of the project with the text " width="780" height="520"&gt;&lt;/a&gt;&lt;br&gt;
I've recently completed the &lt;a href="https://cloudresumechallenge.dev/docs/the-challenge/aws/" rel="noopener noreferrer"&gt;Cloud Resume Challenge&lt;/a&gt; created by Forrest Brazeal. The challenge involves building a resume website using modern cloud technologies by completing a set of challenges called "Chunks". For my version, I created a static resume site in NextJS that is hosted on S3 and Cloudfront with a user count tracking feature that is implemented with Lambda and DynamoDB. I also applied modern DevOps principles by deploying the infrastructure as code with Terraform, creating a CI/CD pipeline using GitHub Actions, and performing end-to-end testing with Cypress. Working through this challenge felt like a breath of fresh air. While I’ve worked on many web projects throughout my college years, I haven’t gained as much insight and new perspectives on software development as I did from working through this challenge.&lt;/p&gt;

&lt;h2&gt;
  
  
  Chunk 1: The Front End
&lt;/h2&gt;

&lt;p&gt;The Cloud Resume Challenge is divided into four chunks, with each chunk containing a set of steps to build each component of the project. I started with the first chunk by building the front end of the resume website which consists of a website built with NextJS. The challenge only requires a basic HTML/CSS website, but I chose to use NextJS because, having worked with web projects in the past, a pure HTML/CSS website will be more difficult to maintain in the future than a production-grade platform like NextJS. I also decided on this framework because of its flexibility in creating both server-sided and static websites, and I wanted to gain experience with this framework for future projects. I also chose to implement the frontend components as Infrastructure as Code (IaC) earlier than required using Terraform. This decision made the learning process for this tool easier as it allowed me to write IaC code on a simpler architecture, and it would save me time when I had to convert the entire project to IaC later on.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd55iodequfm28n702vb6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd55iodequfm28n702vb6.png" alt="The resume website side-by-side with the source code written in Javascript" width="800" height="383"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Then, I deployed the site to AWS using an S3 bucket and cache the website on the Cloudfront CDN service. I also secured the web requests sent to the website by implementing HTTPS using AWS Certificate Manager. So far this is nothing too different from my past projects, but as I moved on to the next chunks, my past experience would become less and less useful.&lt;/p&gt;

&lt;h2&gt;
  
  
  Chunk 2: The Back End
&lt;/h2&gt;

&lt;p&gt;The main feature of chunks 2 and 3 is the visitor count. The challenge required a persistent counter to show the number of visitors that viewed the page and update the number each time the user refreshes the page. I began working on the backend of this feature in chunk 2 by writing a Lambda function in Python that stores and updates the user count as a statistic record in the DynamoDB database each time a visitor views the site. The Lambda function is activated by making HTTP POST calls to a public AWS API gateway service.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9vnqf0x5cb25wn8p35ls.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9vnqf0x5cb25wn8p35ls.png" alt="Diagram of the visitor count widget sending data to the AWS back end services" width="800" height="415"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I also added the extra functionality of a counter that displays the number of unique visitors to the site. A flaw with the regular visitor counter is that it counts up when the page is refreshed, and a user can inflate the number by refreshing the page multiple times. I created another counter that keeps track of the number of unique visitors by caching the IP address of each visitor and only counting the user as unique if their IP address is not contained in the cache. I also maintain the anonymity of each visitor by storing their IP addresses as hashes in the database. This extra interactivity can make the visitors feel more welcomed by having the website accurately recall the visitor’s interaction with the website.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F76kcfh1q3glxf78tns1y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F76kcfh1q3glxf78tns1y.png" alt="Step-by-step diagram of how the unique visitor count widget stores identify the unique IP of each user" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Chunk 3: Integrating Both Ends
&lt;/h1&gt;

&lt;p&gt;While the backend infrastructure of the visitor count was completed in the previous chunk, I still needed to make sure the website retrieves the data from the backend. My work in Chunk 3 was centered around making the counter that shows the correct number of visitors on the website and writing smoke tests for both the front end and the back end. I made the website query the API gateway for visitor count data by using Javascript code to make HTTP POST requests to the API gateway endpoints.&lt;/p&gt;

&lt;p&gt;However, new challenges began to arise when I had to write code tests for the website. In my past college projects, quality isn't as important as submitting the project. As long as they are done enough and are submitted on time, the project would be considered “completed” even if the project has a high chance of being non-functional.  With this focus on getting things out of the door rather than functionality, I’ve never considered writing tests for my projects. I've always had to manually test functionality by hand, leading to me implementing incomplete and broken features that require more patches in the future. In the worst case, the patches, due to the use of manual testing, would also introduce more issues in the code that require even more patches down the road. Having worked on many projects where quality doesn’t matter, I assumed that testing to be a waste of time, and time could be better spent on troubleshooting the issues when they appear.&lt;/p&gt;

&lt;p&gt;My mindset began to change when chunk 3 required that I write tests for the API gateway. The challenge calls for the implementation of  "smoke tests," end-to-end tests that measure the full functionality of the website, in Cypress.  I had to think about what could go wrong in my code, like considering how an HTTP request could contain malformed headers or what can happen when the Lambda code increments the visitor count without initializing the database.&lt;/p&gt;

&lt;p&gt;Thinking about the potential issues and writing tests to detect them made me realize the benefit of testing. A testing framework like Cypress ensures that each test runs consistently and correctly, while also providing the ability to quickly run all of the tests in a single click. The framework offloads the toil and prevents the errors of manual testing, giving me more time to work on the features of the website. Writing tests that ensure the website is functioning properly became as important as writing the code for the website itself. The focus changed from just "getting it done", to using tools that empower me to create quality work while also making it easy to do so.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo7jfr8ba79w8sjp7e6zi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo7jfr8ba79w8sjp7e6zi.png" alt="A Cypress GUI showing a passing test run of the resume website" width="800" height="490"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;During this chunk, I've also completed the task of earning an AWS certification. The original challenge required that I acquire the AWS Certified Cloud Practitioner certificate, but I already have that certificate. To further my learning, I've decided to obtain the associate-level AWS Certified Solutions Architect certificate. In hindsight, it was a good decision to get this certificate because I learned how the AWS services work and how they integrate together in more depth than the practitioner-level certificate.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F582bp1872kyu55ogtlma.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F582bp1872kyu55ogtlma.png" alt="AWS Certified Solutions Architect certificate addressed to Patima Poochai" width="800" height="618"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Chunk 4: The Last Stretch
&lt;/h2&gt;

&lt;p&gt;On the last stretch of the challenge, I migrate the remaining backend infrastructure to IaC with Terraform. I also created a CI/CD pipeline using GitHub Actions to manage the code hosted on a GitHub repository. GitHub Actions would deploy the code to AWS and run the end-to-end tests automatically when new code is committed to either of the repositories. This tool has solved another common issue I’ve had with my project in the past as well, where each project always had a major pain point where it would take a long time to troubleshoot issues when people commit broken code into the repository. Without automatic testing, there was a lot of work to identify what was broken and track down which commit caused the issue. There were even times when people would submit more commits with broken code while I was still troubleshooting the initial issue. If I had this CI/CD system that would have automatically checked the code of each commit and identified the commit with broken code at the time, it would have saved me from many sleepless nights spent troubleshooting unknown errors.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxhd7d6lmn49kl9yh43sd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxhd7d6lmn49kl9yh43sd.png" alt="A successful run of the Github Actions" width="800" height="554"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;One major problem I had in this chunk was troubleshooting incorrect permissions in the IAM policy for GitHub Actions. The GitHub Actions pipeline needed an AWS role with the permissions to access necessary services to deploy the infrastructure with Terraform, and it was unclear what permissions were needed across 7 different AWS services. At first, I figured I could attempt to deploy the infrastructure, read the errors outputted by Terraform, and manually add the missing infrastructure to the IAM permissions. However, it was time-consuming as each error message would only show one missing permission out of possibly hundreds of lines of missing permissions. It also took around 10 minutes for the pipeline to attempt a deployment and then display the errors that have occurred during the deployment.  I could have kept deploying the infrastructure and adding permissions one by one like how I’ve approached my past projects before, but with how much this project has challenged my thinking, I wanted to see if there’s a better way to solve this problem.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8bf4mp3qrrztxoi7f4zc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8bf4mp3qrrztxoi7f4zc.png" alt="An error message that describes the missing permissions needed to run Terraform deployment to AWS" width="800" height="352"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;At first, I tried capturing the needed permissions using AWS’s built-in API calls logging tool. I tried using Terraform to deploy the website to a separate test account and logged the permissions that were used. I could then use the names of the API calls to write the IAM permissions policy. However, this still didn't work as the name of the API calls isn’t a valid permission name inside an IAM policy, and some IAM permissions were missing entirely. The challenge didn’t provide any recommended way around this problem, so I was uncertain if I’d be able to find a way around these issues.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff9w845236ysxa01o9yai.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff9w845236ysxa01o9yai.png" alt="AWS Cloudtrail showing the logged API calls that was used by Terraform" width="800" height="478"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;But then, I found the tool called “&lt;a href="https://github.com/iann0036/iamlive" rel="noopener noreferrer"&gt;iamlive&lt;/a&gt;”. It works by creating an HTTP proxy on your local machine, capturing the API calls that Terraform made through the proxy, and then transforming them into a text file formatted as an IAM policy. At first, I wasn’t confident if the tool would work as I  wasn’t able to set up the proxy with the recommended command line arguments. However, by spending some time troubleshooting what command line arguments were needed for my setup, I was able to get it to capture the IAM permissions. With this tool, I was able to finish this chunk by using the captured permissions to write IAM policies that allowed GitHub Actions to deploy Terraform infrastructure to AWS.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuwkxa11o0kkey7qt0nzl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuwkxa11o0kkey7qt0nzl.png" alt="Live output of the permissions when running Terraform through iamlive's proxy" width="800" height="555"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;By taking this challenge, I have learned a lot about the tools used in cloud operations and the modern DevOps principles. I sped up my infrastructure deployment process using Terraform, removed the toil from running manual tests by creating end-to-end tests with Cypress, and minimized time spent troubleshooting issues with a CI/CD pipeline built with GitHub Actions. I’ve also learned that, by being willing to break away from the bad habits I’ve built up from my past project,  I could change my workflow for the better.&lt;/p&gt;

&lt;h2&gt;
  
  
  See Also
&lt;/h2&gt;

&lt;p&gt;View the resume site &lt;a href="https://resume.patimapoochai.net/" rel="noopener noreferrer"&gt;here&lt;/a&gt;&lt;br&gt;
Check out the project code below:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/patimapoochai/cloud-resume-challenge" rel="noopener noreferrer"&gt;Front End Source Code&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/patimapoochai/cloud-resume-backend" rel="noopener noreferrer"&gt;Back End Source Code&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>aws</category>
      <category>webdev</category>
      <category>serverless</category>
      <category>terraform</category>
    </item>
  </channel>
</rss>
