Søren Schmidt wrote: > > I just applied your patch to clean sources dated > 2004.08.04.13.00.00 and ran > > some tests. Everything seems to be working as it should > (just like after the > > serialization patch from Ville-Pertti that I tried > earlier). I will continue > > running with this patch applied to see if it stays stable. > > Good! please keep me posted! Unfortunately the machine disconnected one of the SATA discs earlier today. It did so out-of-the-blue, because there was no activity at all on either of the two discs other than the SMART monitor. Aug 5 11:45:47 fortify kernel: ad20: WARNING - removed from configuration Aug 5 11:45:47 fortify kernel: ata10-master: FAILURE - unknown CMD (0xb0) timed out Aug 5 11:45:47 fortify smartd[882]: Device: /dev/ad20, not capable of SMART self-check No other interesting messages in the log. The channel was, as usual, completely locked after this and it took an extended power-off (2 min) to unlock it (I really don't know what is up with that). Once the channel was unlocked it booted up but page-faulted in the middle of detecting the attached discs (another reboot took care of that problem, not sure if the page fault info is interesting at all, but here it is): [...] ad16: 114473MB <WDC WD1200JB-00DUA3> [232581/16/63] at ata8-master UDMA100 ad18: 26059MB <Maxtor 92732U8> [52946/16/63] at ata9-master UDMA66 ad20: 239372MB <Maxtor 7Y250M0> [486344/16/63] at ata10-master SATA150 kernel trap 12 with interrupts disabled Fatal trap 12: page fault while in kernel mode fault virtual address = 0x24 fault code = supervisor read, page not present instruction pointer = 0x8:0xc0580904 stack pointer = 0x10:0xdd6e5c1c frame pointer = 0x10:0xdd6e5c44 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = resume, IOPL = 0 current process = 35 (swi5: clock sio) [thread 100036] Stopped at propagate_priority+0x84: movl 0x24(%eax),%eax db> trace propagate_priority(c2734420,c078a9a0,c056f8a9,c0790780,c26e47d0) at propagate_priority+0x84 turnstile_wait(c2735bc0,c078e960,c078a9a0,0,c27440ac) at turnstile_wait+0x31c _mtx_lock_sleep(c078e960,c2734420,0,0,0) at _mtx_lock_sleep+0xe8 softclock(0,0,ffffffff,ffffbfff,ffffffff) at softclock+0x248 ithread_loop(c26d0080,dd6e5d48,ffffffff,ffffffff,ffffffff) at ithread_loop+0x1a8 fork_exit(c05439c0,c26d0080,dd6e5d48) at fork_exit+0x80 fork_trampoline() at fork_trampoline+0x8 --- trap 0x1, eip = 0, esp = 0xdd6e5d7c, ebp = 0 --- It should have looked something like this: [...] ad16: 114473MB <WDC WD1200JB-00DUA3> [232581/16/63] at ata8-master UDMA100 ad18: 26059MB <Maxtor 92732U8> [52946/16/63] at ata9-master UDMA66 ad20: 239372MB <Maxtor 7Y250M0> [486344/16/63] at ata10-master SATA150 ad22: 238475MB <WDC WD2500JD-00FYB0> [484521/16/63] at ata11-master SATA150 ar0: 476950MB <ATA RAID0 array> [60802/255/63] status: READY subdisks: disk0 READY on ad4 at ata2-master disk1 READY on ad5 at ata2-slave ar1: 478744MB <ATA RAID0 array> [61031/255/63] status: READY subdisks: disk0 READY on ad6 at ata3-master disk1 READY on ad7 at ata3-slave ar2: 388962MB <ATA RAID0 array> [49585/255/63] status: READY subdisks: disk0 READY on ad9 at ata4-slave disk1 READY on ad8 at ata4-master ar3: 228946MB <ATA RAID0 array> [29186/255/63] status: READY subdisks: disk0 READY on ad15 at ata7-slave disk1 READY on ad16 at ata8-master Waiting 5 seconds for SCSI devices to settle [...] I have switched back to the patch from Ville-Pertti that serializes the controller for now, to see if that is more stable. /Daniel ErikssonReceived on Thu Aug 05 2004 - 15:11:22 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:04 UTC