ricco020

Posted on Jun 11

Recovering data from a failed RAID array with ddrescue: a practical walkthrough

#datarecovery #linux #sysadmin #tutorial

When a RAID array fails, the worst thing you can do is panic and start poking at it immediately. I've seen too many cases where an impatient rebuild attempt overwrote the only good copy of data. This walkthrough covers how to safely approach a degraded or failed RAID — with ddrescue as your best friend.

Step 0: Stop. Don't touch the array yet.

Before running mdadm --assemble, before doing anything, clone your physical disks. A RAID 5 with one failed drive can lose everything the moment a second drive throws a read error during rebuild. This isn't hypothetical — it's how most total RAID losses happen.

The golden rule: image first, recover second.

Step 1: Assess the damage

# Check current RAID state
cat /proc/mdstat

# More detail
mdadm --detail /dev/md0

Look for:

[UUU_] — one drive failed (underscore = missing)
[UU__] — two drives failed (catastrophic for RAID 5)
State: degraded, recovering, or failed

Do NOT run mdadm --manage /dev/md0 --add /dev/sdX yet. Stop the array instead:

mdadm --stop /dev/md0

Step 2: Clone each disk with ddrescue

ddrescue is the right tool because it handles read errors gracefully: it maps bad sectors, retries them, and lets you resume interrupted sessions. Never use dd for a failing disk.

Install it:

# Debian/Ubuntu
sudo apt install gddrescue

# RHEL/CentOS
sudo dnf install ddrescue

Clone each RAID member to a separate image file (you need enough storage — same total size as all disks combined):

# First pass: copy everything readable, skip bad sectors fast
sudo ddrescue -d -r0 /dev/sda /mnt/backup/sda.img /mnt/backup/sda.log

# Second pass: retry bad sectors up to 3 times
sudo ddrescue -d -r3 /dev/sda /mnt/backup/sda.img /mnt/backup/sda.log

Key flags:

-d — direct disk access (bypass kernel cache)
-r0 / -r3 — retry bad sectors 0 or 3 times
The .log mapfile is critical: it lets you resume if the clone is interrupted

Repeat for every disk in the array (sdb, sdc, etc.).

Step 3: Work from the images

Once you have image files, assemble a software RAID from the images using loop devices — never from the raw physical disks again:

# Set up loop devices
sudo losetup /dev/loop0 /mnt/backup/sda.img
sudo losetup /dev/loop1 /mnt/backup/sdb.img
sudo losetup /dev/loop2 /mnt/backup/sdc.img

# Try to assemble (read-only is ideal)
sudo mdadm --assemble --readonly /dev/md0 /dev/loop0 /dev/loop1 /dev/loop2

If mdadm complains about mismatched superblocks or won't assemble, try with --force:

sudo mdadm --assemble --force /dev/md0 /dev/loop0 /dev/loop1 /dev/loop2

Step 4: Mount and verify

# Mount read-only first — never mount degraded arrays read-write
sudo mount -o ro /dev/md0 /mnt/raid_recovery

# Check what's there
ls -la /mnt/raid_recovery/
df -h /mnt/raid_recovery/

If the filesystem is ext4 and won't mount, try fsck on the loop-assembled md device before mounting:

sudo fsck.ext4 -n /dev/md0   # -n = dry run, no writes

XFS arrays need xfs_repair -n /dev/md0 for the dry-run equivalent.

Common RAID 5 pitfalls

Pitfall 1: Dirty bit / write-intent bitmap mismatch
If the array was running during a crash, the write-intent bitmap may be inconsistent. mdadm will want to do a full resync — on loop images, this is safe, but watch for it.

Pitfall 2: Mixed sector sizes
Some drives report 512-byte sectors but use 4K internally (512e). If ddrescue reports many small errors clustered at regular intervals, check:

sudo blockdev --getpbsz /dev/sda

Pitfall 3: RAID 6 with two failed disks
RAID 6 tolerates two drive failures, but not two drives with extensive bad sectors on top of each other. Get every readable byte off both degraded disks with ddrescue before attempting assembly.

Pitfall 4: Chunk size mismatch
RAID chunk sizes are stored in the superblock. If you're manually reassembling with --force, you may need to specify --chunk=512 (or whatever the original was). Check old mdadm.conf or strings on a disk image for metadata.

Verify before you declare success

# Hash check critical files
find /mnt/raid_recovery -name "*.db" -exec md5sum {} + > /tmp/recovered_hashes.txt

# Check filesystem integrity
sudo dmesg | grep -i "raid\|md0\|error" | tail -30

Don't unmount until you've copied everything critical to a separate, healthy disk.

When self-recovery isn't enough

If your array has two or more failed members with severe bad sectors, software reassembly may not be enough. The logical structure (stripe layout, chunk boundaries) can be reconstructed manually — but it's extremely time-consuming and error-prone without specialized tools. At that point it's worth reading a detailed overview of RAID failure modes and professional recovery options before deciding whether to escalate.

The most important takeaway: image everything before you touch anything. ddrescue + loop devices gives you a safe sandbox to experiment in without risking your only copy of the data.

Good luck — and may your parity drives never fail.

DEV Community