RE: ATA driver races with interrupts

From: Daniel Eriksson <daniel_k_eriksson_at_telia.com>
Date: Thu, 5 Aug 2004 19:11:22 +0200
Søren Schmidt wrote:

> > I just applied your patch to clean sources dated 
> 2004.08.04.13.00.00 and ran
> > some tests. Everything seems to be working as it should 
> (just like after the
> > serialization patch from Ville-Pertti that I tried 
> earlier). I will continue
> > running with this patch applied to see if it stays stable.
> 
> Good! please keep me posted!

Unfortunately the machine disconnected one of the SATA discs earlier today.
It did so out-of-the-blue, because there was no activity at all on either of
the two discs other than the SMART monitor.

Aug  5 11:45:47 fortify kernel: ad20: WARNING - removed from configuration
Aug  5 11:45:47 fortify kernel: ata10-master: FAILURE - unknown CMD (0xb0)
timed out
Aug  5 11:45:47 fortify smartd[882]: Device: /dev/ad20, not capable of SMART
self-check

No other interesting messages in the log. The channel was, as usual,
completely locked after this and it took an extended power-off (2 min) to
unlock it (I really don't know what is up with that).

Once the channel was unlocked it booted up but page-faulted in the middle of
detecting the attached discs (another reboot took care of that problem, not
sure if the page fault info is interesting at all, but here it is):

[...]
ad16: 114473MB <WDC WD1200JB-00DUA3> [232581/16/63] at ata8-master UDMA100
ad18: 26059MB <Maxtor 92732U8> [52946/16/63] at ata9-master UDMA66
ad20: 239372MB <Maxtor 7Y250M0> [486344/16/63] at ata10-master SATA150
kernel trap 12 with interrupts disabled


Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0x24
fault code              = supervisor read, page not present
instruction pointer     = 0x8:0xc0580904
stack pointer           = 0x10:0xdd6e5c1c
frame pointer           = 0x10:0xdd6e5c44
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = resume, IOPL = 0
current process         = 35 (swi5: clock sio)
[thread 100036]
Stopped at      propagate_priority+0x84:        movl    0x24(%eax),%eax
db> trace
propagate_priority(c2734420,c078a9a0,c056f8a9,c0790780,c26e47d0) at
propagate_priority+0x84
turnstile_wait(c2735bc0,c078e960,c078a9a0,0,c27440ac) at
turnstile_wait+0x31c
_mtx_lock_sleep(c078e960,c2734420,0,0,0) at _mtx_lock_sleep+0xe8
softclock(0,0,ffffffff,ffffbfff,ffffffff) at softclock+0x248
ithread_loop(c26d0080,dd6e5d48,ffffffff,ffffffff,ffffffff) at
ithread_loop+0x1a8
fork_exit(c05439c0,c26d0080,dd6e5d48) at fork_exit+0x80
fork_trampoline() at fork_trampoline+0x8
--- trap 0x1, eip = 0, esp = 0xdd6e5d7c, ebp = 0 ---


It should have looked something like this:
[...]
ad16: 114473MB <WDC WD1200JB-00DUA3> [232581/16/63] at ata8-master UDMA100
ad18: 26059MB <Maxtor 92732U8> [52946/16/63] at ata9-master UDMA66
ad20: 239372MB <Maxtor 7Y250M0> [486344/16/63] at ata10-master SATA150
ad22: 238475MB <WDC WD2500JD-00FYB0> [484521/16/63] at ata11-master SATA150
ar0: 476950MB <ATA RAID0 array> [60802/255/63] status: READY subdisks:
 disk0 READY on ad4 at ata2-master
 disk1 READY on ad5 at ata2-slave
ar1: 478744MB <ATA RAID0 array> [61031/255/63] status: READY subdisks:
 disk0 READY on ad6 at ata3-master
 disk1 READY on ad7 at ata3-slave
ar2: 388962MB <ATA RAID0 array> [49585/255/63] status: READY subdisks:
 disk0 READY on ad9 at ata4-slave
 disk1 READY on ad8 at ata4-master
ar3: 228946MB <ATA RAID0 array> [29186/255/63] status: READY subdisks:
 disk0 READY on ad15 at ata7-slave
 disk1 READY on ad16 at ata8-master
Waiting 5 seconds for SCSI devices to settle
[...]


I have switched back to the patch from Ville-Pertti that serializes the
controller for now, to see if that is more stable.

/Daniel Eriksson
Received on Thu Aug 05 2004 - 15:11:22 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:04 UTC