The lab computer I am using has a raid configuration consists of 2
500GB SATA hard disk. When I installed Fedora on the PC, I also setup
it in a way that my /boot, / and swap partitions are all MD
devices. I mentioned before, that this system has had a failure
due to corrupt superblock. Yet I haven’t figured out the reason, the
same situation happened again yesterday. I think I have to note the
steps of repairing it down because I’ve already a bit confused
yesterday as to how I did it the first time.
The failure is during booting. The normal procedure will be interrupt with
error messages similar to this:
mdadm: WARNING /dev/sda6 and /dev/sdb6 appear to have very similar
superblocks. If they are really different, please —zero the
superblock on one If they are the same or overlap, please remove one
from the DEVICE list in mdadm.conf.
And following that will be a “failed to mount /root fs” error.
First of all, this is a RAID 1 configuration with 3 MD devices: md0,
md1 and md2.
md0 consists of sda6 and sdb6 and is mounted to /boot
md1 consists of sda7 and sdb7 and is mounted to /
md2 consists of sda8 and sdb8 and is used as swap
sda6, sdb6, sda7, sdb7 are all of type ext3.
Since it’s RAID 1, the content on both of the partitions associated
with each MD device should be the same, and it seems the superblock
appears to be different (“very similar”), so when the MD device try to
assemble them, it fails. Therefore, without any activated MD devices,
no surprise that the root filesystem can not mount. (I guess other two
MD devices didn’t work either at that time.)
I booted to the rescue system provided by Fedora installation
kernel. It can be reached from the installation media (CD, DVD) or
GRUB if your /boot folder has the vmlinuz and initrd images
needed for a hard disk installation. If it is CD/DVD, enter linux
rescue when you start it, in the case of GRUB, the lines to start
look like this: (you can enter it line-by-line after you press c)
root (hd0,5)
kernel vmlinuz-install rescue
initrd initrd-install.img
boot
Once in the command line interface after the rescue system is
booted. You may or may not have the /dev/md* devices, but I guess
that doesn’t matter. What matters is you should have the mdadm
command and you know where is the original mdadm.conf file.
The first thing is to get the make mdadm.conf file
accessible. So I mounted one of the root partitions (e.g. /dev/sdb7) to
/mnt/sda7. At these time if you issue command like
mdadm -D /dev/md0 --config /mnt/sdb7/etc/mdadm.conf
you will be told the MD device is not active. But if you try to assemble
it using
mdadm -A /dev/md0 --config /mnt/sdb7/etc/mdadm.conf
you will get the same warning message during booting process.
My trick here is, since MD knows how to synchronize two partitions if
their contents are different and are supposed to be consistent, I can
active an MD device with only one of the partition, so I won’t get the
warning and at least the MD device will be up running, then I hot plug
the other partition and rest of the work would leave to the MD system.
I don’t know if there is a one-liner for assembling MD device with
selective partition, but I do know if the device is “busy”, it won’t
be assembled. So I mount /dev/sdb6 and /dev/sdb8(using swapon
/dev/sdb8) too and then try assembling all MD devices again. This
time all MD devices is activated with only one of their
partitions. Then we can unmount the intentionally occupied partitions,
mount the /dev/md1 device (so we still have access to the
mdadm.conf file, of course you can alternatively copy that file to
the current system to avoid this mounting operation) and use the
following command to add them to their respective MD devices.
mdadm /dev/md0 --add /dev/sdb6 --config /mnt/md1/etc/mdadm.conf
mdadm /dev/md1 --add /dev/sdb6 --config /mnt/md1/etc/mdadm.conf
mdadm /dev/md2 --add /dev/sdb6 --config /mnt/md1/etc/mdadm.conf
Now only one step is left: waiting. Depending on how large your
partitions are, the re-synchronizing process could take hours. (My
root partition is 90GB and it took around half an hour to finish the
synchronizing.) You can check the progress by checking the content of
file /proc/mdstat. When it’s done, reboot and Viola, problem solved.