RE: Accessing IDE disk with bad sectors freezes the box

From: Daniel Eriksson <daniel_k_eriksson_at_telia.com> Date: Wed, 11 May 2005 18:32:41 +0200 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:34 UTC

Ruslan Ermilov wrote:

> I have a disk with lot of bad sectors.  When working with it on
> an AMD64 box running 6-CURRENT, accessing bad areas just freezes
> the box completely, without any diagnostics.  The same disk when
> plugged into another i386 box running 4-STABLE works properly by
> issuing errors from the kernel, and reporting EIO to userland.

I was just about to report the same problem. Three days ago one of my SATA
disks suddenly developed a few bad sectors. Smartd reported this to me, so I
set out to try to recover the data. This was on an AMD Athlon XP (i386)
machine running the latest CURRENT, and the disc was hooked up to a Promise
SATA150 TX4. It didn't take long for the machine to lock up solid once I
started to read data from it, even a 'dd' from the raw disk caused a solid
lock (wanted to see if the problem was vfs related).

Yesterday I hooked the disk up to a spare machine (also i386) running
5.4-RC4. The motherboard has a built-in SiL 3112 based controller which I
used. I was quite surprised when instead of a crash it just printed some
errors on the console and then continued to read the data. This is what it
looked like:

ad4: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=272855487
ad4: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=429057631
ad4: FAILURE - READ_DMA status=51<READY,DSC,ERROR> error=1<ILLEGAL_LENGTH>
LBA=416906399
ad4: FAILURE - READ_DMA status=51<READY,DSC,ERROR> error=1<ILLEGAL_LENGTH>
LBA=416906399
ad4: FAILURE - READ_DMA status=51<READY,DSC,ERROR> error=40<UNCORRECTABLE>
LBA=478486239
spec_getpages:(ad4s1d) I/O read failure: (error=5) bp 0xc65ffb14 vp
0xc19ef738
               size: 65536, resid: 65536, a_count: 65536, valid: 0x0
               nread: 0, reqpage: 0, pindex: 96, pcount: 16
vm_fault: pager read error, pid 2990 (cp)
ad4: FAILURE - READ_DMA status=51<READY,DSC,ERROR> error=40<UNCORRECTABLE>
LBA=478507071
spec_getpages:(ad4s1d) I/O read failure: (error=5) bp 0xc65ffb14 vp
0xc19ef738
               size: 65536, resid: 65536, a_count: 65536, valid: 0x0
               nread: 0, reqpage: 0, pindex: 96, pcount: 16
vm_fault: pager read error, pid 2990 (cp)
ad4: FAILURE - READ_DMA status=51<READY,DSC,ERROR> error=40<UNCORRECTABLE>
LBA=478510815
spec_getpages:(ad4s1d) I/O read failure: (error=5) bp 0xc65ffb14 vp
0xc19ef738
               size: 65536, resid: 65536, a_count: 65536, valid: 0x0
               nread: 0, reqpage: 0, pindex: 192, pcount: 16
vm_fault: pager read error, pid 2990 (cp)
ad4: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=73765143
ad4: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=76399623

I have a 6-CURRENT installation on a spare disk that I will hook up to the
machine later and see how it handles the bad sectors using the same
controller. I'll report back later tonight if I can find the time to do it.

/Daniel Eriksson