Originally Published: Thursday, 9 August 2001 Author: Mark Warburton
Published to: enhance_articles_sysadmin/Sysadmin Page: 1/2 - [Printable]

Recovering from an ext2 Hard Drive Crash

Mark Warburton recently recovered from a nasty crash. An admin who refuses to give up, Mark shares his problem and ultimate solution with the readers of Linux.com.

   Page 1 of 2  >>

I have a success story in recovering most of an ext2 filesystem on a hard drive that died.

I had just spent a week programming a project. I was just about to backup the Linux partition to CD; I just had to get some CD media. On the day I was intending to do the backup, I rebooted my machine to help someone out by running a windoze program on my win95 partition. After finishing that, I rebooted into Linux.

My blood ran cold when I got a kernel panic and the message:

hda: dma_intr: status=0x51 { DriveReady SeekCompleteError }
hda: dma_intr: error=0x40 { UncorrectableError } LBAsect = xxxx, sector xxxx
end_request: I/O error, dev 03:03 (hda), sector xxxx

Rebooting onto a floppy recovery disk and running:

e2fsck -f /dev/hda3

(where /dev/hda3 is my ext2 Linux partition. -f is used to force the check) gave a similar message when trying to read the initial superblock. My first superblock was damaged! I then tried using:

mke2fs -n /dev/hda3

Before you get all excited, this is non-destructive. -n helps by giving the superblock location on the partition without actually doing the format. Remember the -n or you will actually reformat the partition!

Okay, so now run e2fsck again on the next superblock:

e2fsck -f -b xxx /dev/hda3

(where xxx is an alternative superblock from the output of mke2fs.) I took down all the details for damaged files that were listed so I could delete or recover these later. Fortunately, most of the damaged files were .wav files from vinyl recordings I had been making in an effort to put them onto CD. None of my hard-won programming was reported as damaged.

The program ran without a hitch until it tried to write the corrected superblock to the first superblock, whereupon it failed in the same way as before. I tried mounting the filesystem using an alternative superblock:

mount -t ext2 sb=xxx /dev/hda3 /mnt

Unfortunately, it gave a checksum error (perhaps the gurus can tell me why). I tried it on other superblocks and the same thing happened. You can also run a program called find findsuper to locate your superblocks if you do not trust the results from mke2fs. (http://src.openresources.com/debian/src/base/HTML/S/e2fsprogs_1.10.orig%20e2fsprogs-1.10.orig%20misc%20findsuper.c.html) This file is part of the e2fsprogs but is uncompiled by default.

Now I have a problem. I have to get the project off the system in a hurry and do not have the $$$ to get a professional to recover the data. I can afford neither the time nor the money for a professional job. However, I can recommend them if you do have both of the above commodities as I have used them in a different capacity in the past to good effect.

I dived onto another machine and looked up everything on 0x40 errors on ext2 filesystems on the Net. All I got was a lot of people telling folks like me to do a backup of data before the crash - great help: Not! If people are asking for help after the crash, what does it help to tell them about what they ought to have/should have/could have done before the crash? Bunch of armchair critics! I did find one obscure posting (it took me ages to locate it in amongst the critical posts) at: http://uwsg.iu.edu/hypermail/linux/kernel/9707.0/0260.html This gave me the clue to fix the problem.

As a double check I also ran the downloadable Seagate 3D Defence SeaTools utility to check that the drive itself was faulty and not the cable or IDE ports on the motherboard. Other drive manufacturers often provide similar utilities. Unfortunately, the utility confirmed my worst suspicions that the drive media had failed and the drive needed replacing. The other system components were fine. I saved some time for the Techies and me by printing the report and taking this to the shop when I exchanged the drive as they almost always have suspicions that their drive is not to blame.

My broken Seagate Barracuda III 20GB HDD was still under warrantee so I took it to the shop and they gave me a new one. I begged and pleaded for them to lend me the old drive. They kindly but reluctantly lent it to me after making me sign my life away.

I placed the original drive back in its original slot, removed the cable from my CD/RW and used it to plug the new drive in, configured it in the BIOS and booted Linux off a floppy.

I ran fdisk on the new drive (/dev/hdb in my case) and created a partition (hdb1) of the same size or greater than the damaged partition. Now I executed the magic command:

dd conv=noerror,sync bs=1k if=/dev/hda3 of=/dev/hdb1

This copies the damaged filesystem byte for byte onto the new drive's partition. Any errors located are filled with zeroes on the new partition. You end up with a new filesystem of the same size as the original filesystem, which can now be repaired without having hardware failures. Now do a:

e2fsck -f -b xxx /dev/hdb1

Remember that the first superblock is now zeroed due to the failure on the damaged media. An alternative superblock is therefore still required. Fortunately, this worked for me. If this does not work, you may wish to try a utility called e2salvage: http://project.terminus.sk/e2salvage/ Apparently, it is custom made to recover virtually unrecoverable ext2 filesystems. Please read their caveats if you decide to explore this option!

This time, it wrote the first superblock just fine on the new disk. Now you can mount the hard drive in the usual way and copy the files you need off it:

mount -t ext2 /dev/hdb1 /mnt

   Page 1 of 2  >>