DEV Community

iapilgrim
iapilgrim

Posted on

Troubleshooting KVM Issues

🧱 1️⃣ Initial Symptom

❌ “Booting from hard disk…” (hangs)

What This Means Technically

Firmware attempted to boot from the virtual disk but:

  • Could not find a valid bootloader
  • Or could not understand the partition scheme
  • Or firmware mode didn’t match OS install mode

This message comes from firmware (not Linux).

In KVM environments, firmware is provided by:

  • Legacy BIOS (SeaBIOS)
  • UEFI (OVMF from TianoCore)

🔎 2️⃣ First Investigation: Is the Disk Corrupted?

We checked:

qemu-img info
virt-filesystems
Enter fullscreen mode Exit fullscreen mode

What We Found

  • qcow2 format valid
  • Not corrupted
  • GPT layout
  • /dev/sda1 → vfat (511M)
  • /dev/sda2 → ext4
  • /dev/sda3 → swap

Critical Clue

vfat EFI partition = UEFI installation

That immediately ruled out:

  • Disk corruption
  • Missing root filesystem
  • Missing GRUB files

💡 Lesson: Always inspect disk structure before guessing.


⚡ 3️⃣ Root Cause #1 — Firmware Mismatch

Your VM XML showed:

<os>
  <type arch='x86_64' machine='pc-q35-9.2'>hvm</type>
  <boot dev='hd'/>
</os>
Enter fullscreen mode Exit fullscreen mode

No <loader> entry → That means legacy BIOS mode

But your disk layout proved:

  • Installed using UEFI
  • GPT partition table
  • EFI System Partition

Why BIOS Could Not Boot

Legacy BIOS:

  • Looks for MBR
  • Expects GRUB in MBR stage1
  • Does not understand EFI partitions

UEFI:

  • Reads FAT EFI partition
  • Loads .efi binaries

So BIOS firmware had no idea how to boot your disk.

💡 Lesson:
UEFI install + BIOS firmware = guaranteed boot failure.

This is extremely common in:

  • KVM
  • Proxmox VE
  • OpenStack

🔧 4️⃣ Fix #1 — Install OVMF (UEFI Firmware)

Your system didn’t even have OVMF installed:

/usr/share/OVMF/OVMF_CODE.fd → missing
Enter fullscreen mode Exit fullscreen mode

We installed firmware package and found:

OVMF_CODE_4M.fd
OVMF_VARS_4M.fd
Enter fullscreen mode Exit fullscreen mode

Then updated VM XML:

<loader readonly='yes' type='pflash'>
  /usr/share/OVMF/OVMF_CODE_4M.fd
</loader>
<nvram>...</nvram>
Enter fullscreen mode Exit fullscreen mode

Now firmware matched disk layout.

💡 Lesson:
Hypervisor provides firmware. OS depends on firmware type.


🧨 5️⃣ New Symptom — UEFI Shell Appeared

After switching to UEFI, VM booted into:

Shell>
Enter fullscreen mode Exit fullscreen mode

This meant:

  • UEFI firmware is working
  • EFI partition exists
  • But no boot entry in NVRAM

🧠 Why That Happened

When you switched firmware modes:

  • New NVRAM file was created
  • It had no stored boot entries
  • So firmware didn't know about /EFI/debian/grubx64.efi

This is expected behavior.

💡 Lesson:
UEFI boot depends on:

  1. EFI partition
  2. Bootloader file
  3. NVRAM boot entry

All three must exist.


🛠 6️⃣ Manual Boot from EFI Shell

Inside shell:

fs0:
cd EFI\debian
grubx64.efi
Enter fullscreen mode Exit fullscreen mode

GRUB launched → Debian booted.

This proved:

  • Disk is healthy
  • GRUB installed correctly
  • Root filesystem intact

💡 Lesson:
UEFI shell is a powerful debugging tool.


🔐 7️⃣ Root Password Problem

There is no default root password in Debian.

In modern Debian:

  • If you leave root password blank → root login disabled
  • You must use sudo user

You chose to reset password offline using:

guestmount + chroot
Enter fullscreen mode Exit fullscreen mode

That demonstrated:

  • Mount qcow2 offline
  • Enter filesystem
  • Modify system safely

💡 Lesson:
Virtual disks are just files.
You can perform offline surgery safely.


🏁 8️⃣ Permanent Fix — Register GRUB Properly

After booting successfully:

grub-install
update-grub
efibootmgr
Enter fullscreen mode Exit fullscreen mode

You got:

Boot0003* debian
BootOrder: 0003,...
Enter fullscreen mode Exit fullscreen mode

That means:

  • GRUB installed to EFI partition
  • UEFI boot entry created
  • Boot order corrected

Now VM boots automatically.

💡 Lesson:
UEFI requires proper NVRAM registration.


📚 Full Timeline of Issues

Phase Problem Root Cause Fix
1 Hang at boot Firmware mismatch Enable UEFI
2 OVMF missing Firmware not installed Install ovmf
3 XML validation errors Wrong schema structure Correct <os> block
4 UEFI shell No NVRAM boot entry Manual grub launch
5 Root login fail No default password Reset via chroot
6 Manual boot required Boot entry not registered grub-install

🧠 Core Concepts You Mastered

🔹 Firmware Layers

  • BIOS vs UEFI
  • SeaBIOS vs OVMF

🔹 Disk Layout

  • GPT vs MBR
  • EFI partition
  • Root partition
  • Swap

🔹 libvirt XML

  • <loader>
  • <nvram>
  • Machine type (q35)

🔹 UEFI Mechanics

  • EFI shell
  • .efi bootloaders
  • NVRAM boot entries
  • efibootmgr

🔹 Virtual Disk Forensics

  • qemu-img
  • virt-filesystems
  • guestmount
  • chroot repairs

🚀 Biggest Takeaway

The problem was never the OS.

It was the relationship between:

Firmware
↕
Bootloader
↕
Partition scheme
↕
Kernel
Enter fullscreen mode Exit fullscreen mode

In virtualization, firmware mismatches are one of the most common boot failures.

Now you understand that chain completely.


🏆 What Level You’re At Now

You moved from:

“VM won’t boot”

To:

Debugging firmware, UEFI NVRAM, partition schemes, and GRUB inside KVM.

That’s real infrastructure-level troubleshooting.

Top comments (0)