Proxmox & Ceph in a Home Lab: Building a Cluster from a Single Drive & Surviving the CLI Battle

#webdev #ceph #proxmox

Building a Ceph cluster on Proxmox VE in a home lab is a fantastic proving ground. The catch is that Ceph is inherently a powerful Enterprise-grade system – and by default, it demands entire, raw physical drives (raw block devices) for its OSD (Object Storage Daemon) daemons.

But what if your "IT Bunker" only has one physical server with a single extra 111 GB SSD, and you want to simulate a full-fledged 3-node cluster? My architecture looked like this:

admin (Physical Proxmox host)
admin-02 (Nested VM on the main server)
admin-03 (VM... running remotely on a separate desktop PC)

Here is the story of how I "cheated" Ceph's strict mechanics, sliced a single drive into pieces, bypassed hypervisor GUI errors, and ultimately achieved the coveted HEALTH_OK status.

Challenge 1: Handing an LVM Partition over to Ceph

My plan was to split the physical sdc (111.8 GB) drive on the main server into three 35 GB logical volumes using LVM (Logical Volume Manager). Here is the lsblk output from the main host:

sdc                            111.8G LVM2_member disk 
├─vg_ceph_lab-ceph_admin          35G             lvm  
└─vg_ceph_lab-ceph_admin02        35G             lvm

The first node (admin) took its piece without a hitch. The trouble started when I tried to add an OSD for the second node (admin-02) via the main host's CLI:

root@admin:~# pveceph osd create /dev/vg_ceph_lab/ceph_admin
unable to get device info for '/dev/dm-6'

The Solution: Proxmox and Ceph hate shared LVM paths and standard partitions when creating an OSD. Since admin-02 is a virtual machine, I leveraged the built-in QEMU/KVM Virtual Machine Manager (qm) tool. Instead of fighting Ceph daemons on the host, I attached the virtual volume directly into the admin-02 VM, bypassing the filesystem layer entirely:

qm set 103 -scsi2 /dev/vg_ceph_lab/ceph_admin02

Inside the admin-02 VM, the OS saw new, "physical", and raw block hardware. A quick click on Create OSD in the node's Proxmox GUI, and the drive initialized perfectly!

Challenge 2: An External Host and a Rebellious virt-manager

That left the third node: admin-03. The problem? This VM physically resided on my Linux Desktop PC across the room. The qm set command was useless here because the sliced LVM volume was on the server in the rack.

In a cluster architecture, Ceph doesn't care where the physical storage comes from. I decided to ditch the server's LVM for this node and create a new virtual disk locally on my PC. But trying to add a disk via the virt-manager GUI threw a hardware incompatibility error: pc-q35-noble. The graphical interface completely refused to cooperate.

The Solution: When the GUI throws weird exceptions, the terminal is your best friend. I fell back on native tools from the libvirt and QEMU stack.

First, I manually generated the disk file using qemu-img:

sudo qemu-img create -f qcow2 /var/lib/libvirt/images/admin-03-ceph.qcow2 35G

Next, I hard-attached the generated image to the offline VM configuration using virsh:

sudo virsh attach-disk Proxmox-Lab /var/lib/libvirt/images/admin-03-ceph.qcow2 vdb --config --subdriver qcow2

Upon booting admin-03, its internal Proxmox system recognized the /dev/vdb drive. Adding the third OSD in the Proxmox GUI worked flawlessly.

Finale: Pools, Replication, and Green Status

After adding the third OSD, I had a "Frankenstein" cluster: drives supplied by raw LVM from a physical server mixed with a remote qcow2 virtual disk. Did Ceph care? Not at all.

The final step was to create a Ceph Pool—the logical organizational layer where the actual data lands:

Size: 3 (Every data block is replicated in 3 copies).
Min. Size: 2 (The cluster allows I/O writes as long as at least 2 out of 3 nodes are active).

CLI verification (on the main node) showed exactly what I had been fighting for all evening:

root@admin:~# ceph -s
  cluster:
    id:     25f2d0a2-d64b-42a2-b93d-99a560571e0e
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum admin,admin-02,admin-03 (age 6m)
    mgr: admin(active, since 60m), standbys: admin-02, admin-03
    osd: 3 osds: 3 up (since 6m), 3 in (since 6m)

Let's check the CRUSH (Controlled Replication Under Scalable Hashing) algorithm tree using the ceph osd tree command to confirm the cluster topology:

root@admin:~# ceph osd tree
ID  CLASS WEIGHT   TYPE NAME         STATUS  REWEIGHT  PRI-AFF
-1         0.10258 root default                      
-3         0.03419     host admin                     
 0   hdd  0.03419         osd.0          up   1.00000  1.00000
-5         0.03419     host admin-02                  
 1   hdd  0.03419         osd.1          up   1.00000  1.00000
-7         0.03419     host admin-03                  
 2   hdd  0.03419         osd.2          up   1.00000  1.00000

As you can see, Ceph's intelligent engine sees my crafted nodes as separate physical hosts and correctly distributes the traffic, providing me with a single, unified rbd (RADOS Block Device) storage pool labeled ceph-storage in Proxmox.

Conclusion

Home lab experiments quickly expose the gap between dry documentation theory and actual implementations in simulated environments. This deployment taught me two key lessons:

Understanding how to pass storage resources through the hypervisor (like the qm set tool in PVE) is a game changer when Ceph insists on block access and you only have LVM partitions available.
When GUI software (like desktop virt-manager in my case) throws motherboard configuration errors, going back to the basics and pure CLI (virsh, qemu-img) lets you bypass those issues in just a few quick commands.

With a fully functioning, green cluster, I am ready for the next level: Chaos Engineering. I plan to disconnect drives while VMs are running just to see if my Ceph setup is truly as resilient as the documentation promises!