Sajit Maharjan

Posted on May 18

Troubleshooting OpenStack Instance I/O Errors: A Ceph Blocklist Case

#openstack #ceph #kubernetes #cloud

Author: Aaditya Pageni | Infrastructure Engineer

Following a planned power outage in our datacenter, we encountered an issue where all existing VMs became unresponsive while newly created VMs functioned normally. This post documents the investigation, root cause, and resolution.

## Problem Description

After the power restoration:

Ceph reported HEALTH_OK
OpenStack services appeared operational
Newly created VMs booted successfully

However, all pre-existing VMs failed to start, dropping into initramfs with I/O errors before reaching the root filesystem.

No init found. Try passing init= bootarg.

BusyBox v1.36.1 (Ubuntu 1:1.36.1-6ubuntu3.1) built-in shell (ash)
Enter 'help' for a list of built-in commands.

(initramfs)

The key observation:

Newly created VMs worked without issues.

This indicated the problem was specific to the relationship between existing VMs and their storage, rather than network or storage infrastructure issues.

Root Cause Analysis

The issue stemmed from Ceph's RBD exclusive locking mechanism.

This feature prevents simultaneous writes to the same image from multiple clients, avoiding data corruption. When a compute node connects to an RBD volume, it acquires an exclusive lock; when disconnected cleanly, it releases the lock.

During the power outage, compute nodes lost power without clean disconnection. When they returned, they appeared as untrusted clients:

bash ceph osd blocklist ls

Example output:

bash 10.88.10.91:0/3853293677 2026-05-06T08:59:47.102488+0000 10.88.10.90:0/316670229 2026-05-07T00:26:11.581329+0000 10.88.10.90:0/3783311129 2026-05-07T00:26:11.581329+0000 ... listed 14 entries

Ceph blocklists clients that crash without releasing locks to prevent zombie processes from corrupting data.

The old locks remained held by client IDs that no longer existed, creating a deadlock where:

VMs needed the locks to boot
the locks were held by processes that would never release them

Resolution

Verifying Lock State

We verified the theory by checking an affected volume's lock state:

bash rbd lock list --pool volumes --image volume-48ed0d20-f065-4536-b3f2-eac5f3abc5be

Output:

`bash
There is 1 exclusive lock on this image.

Locker ID Address
client.3406724 auto 135766063836400 10.88.10.91:0/3853293677
`

The address matched a blocklisted entry.

The lock was held by a client that would not return to release it.

Removing Stale Locks

The command syntax for force-removing an RBD lock requires positional arguments with quoted strings:

bash rbd lock remove volumes/volume-48ed0d20-f065-4536-b3f2-eac5f3abc5be \ "auto 135766063836400" "client.3406724"

Verification:

bash rbd lock list --pool volumes --image volume-48ed0d20-f065-4536-b3f2-eac5f3abc5be

Output:

bash No locks on this image.

The VM rebooted successfully.

Bulk Resolution

For multiple affected volumes, we used the following script:

`bash
for vol in $(rbd ls volumes); do
locks=$(rbd lock list volumes/$vol 2>/dev/null)

if echo "$locks" | grep -q "client"; then
echo "Removing lock on: $vol"

lock_id=$(rbd lock list volumes/$vol | awk 'NR==3{print $2" "$3}')
locker=$(rbd lock list volumes/$vol | awk 'NR==3{print $1}')

rbd lock remove volumes/$vol "$lock_id" "$locker"

echo "Done: $vol"

fi
done
`

Then hard rebooted all affected VMs:

`bash
for vm in $(openstack server list --all-projects -f value -c ID); do
name=$(openstack server show $vm -f value -c name)
status=$(openstack server show $vm -f value -c status)

echo "Rebooting: $name ($vm) - Current status: $status"

openstack server reboot --hard $vm
done
`

All VMs recovered successfully.

Clearing Blocklist Entries

After confirming all locks were released and VMs were healthy, we cleared the blocklist entries:

bash ceph osd blocklist rm 10.88.10.90 ceph osd blocklist rm 10.88.10.91

Important: Only perform this step after confirming crashed nodes will not return with stale state. Reconnecting zombie processes while another client holds the lock risks data corruption.

Prevention Measures

Granting OpenStack Blocklist Capabilities

OpenStack requires specific Ceph capabilities to manage blocklist entries automatically.

Without:

bash allow command "osd blocklist"

Nova cannot clear stale entries automatically.

Step 1: Check Current Capabilities

bash ceph auth get client.openstack

Step 2: Add Blocklist Capability

`bash

First, save existing OSD caps

ceph auth get client.openstack -o /tmp/openstack.keyring

Then update caps (adjust pool names and OSD caps for your environment)

ceph auth caps client.openstack \
mon 'allow r, allow command "osd blocklist"' \
osd 'allow class-read object_prefix rbd_children, allow rwx pool=images, allow rwx pool=volumes, allow rwx pool=vms, allow rwx pool=backups'
`

Note: Adjust pool names according to your environment (vms, volumes, images, etc.).

Step 3: Verify the Update

bash ceph auth get client.openstack

Nova Configuration Tuning

The following settings were added to nova.conf on compute nodes:

ini [libvirt] hw_disk_discard = unmap disk_cachemodes = network=writeback rbd_io_timeout = 30

The rbd_io_timeout parameter gives the RBD client additional time to recover during transient issues rather than immediately failing I/O.

Key Takeaways

Ceph's blocklist mechanism protects data from split-brain scenarios.
The issue arises from unclean shutdowns leaving orphaned locks behind.
New VMs working while existing VMs fail is a strong diagnostic indicator.
This pattern after an outage strongly suggests blocklist-related issues.
Proactively grant blocklist permissions to the OpenStack Ceph client.
The allow command "osd blocklist" capability enables automatic recovery without manual intervention.
The rbd lock remove syntax requires positional arguments with quoted strings.
The --locker flag is not available in many versions.

Correct format:

bash rbd lock remove <pool>/<image> "<lock_id>" "<locker>"

Include lock-related failure scenarios in disaster recovery testing. Standard monitoring and backup verification may not catch this failure mode.

DEV Community