What happens when you run an Incus cluster with two nodes, and one of the nodes goes into a disaster — like an SD card corruption — and your .img backup doesn't help after recovery?
Step-by-step Breakdown
After realizing that the second node (with the dead SD card) couldn't boot anymore, I began recovery on the surviving node.
Check the cluster
incus cluster list
From here, I identified the broken node name (e.g. node-broken) and noted which containers were located there.
Try removing the broken node
incus cluster remove broken-node
But got an error:
Error: Node still has the following instances: a1, a2, a3, b4, b5
Tried to delete those instances manually
incus delete a1
or
incus delete a1 --force
Still failed:
Error: Failed checking instance exists "local:a1": Missing event connection with target cluster member``
Final Fix: Using SQL to Delete Orphaned Instances
At this point, the only way was to manually remove the leftover metadata using the Incus admin SQL tool.
incus admin sql global "SELECT name, node_id FROM instances WHERE name='a1';"
incus admin sql global "SELECT name, node_id FROM instances WHERE name='a1';"
====== This part partially success [ kiv ] ===
incus cluster remove broken-node
Error: Delete "https://192.168.xxx.xxx:8443/1.0/storage-pools/local": Unable to connect to:
incus admin sql global "SELECT id, name FROM nodes;"
incus admin sql global "SELECT id, name FROM storage_pools;"
nodes:
1|node-alive
2|node-dead
storage_pools:
1|local
incus admin sql global "DELETE FROM storage_pools_nodes WHERE node_id=2 AND storage_pool_id=1;"
incus admin sql global "SELECT id, name, type, project_id FROM storage_volumes WHERE node_id=2;"
Haven't tried importing the backup from the MinIO server yet
Top comments (0)