On Thursday 10 June 2004 14:28, Alastair G. Hogge wrote: [I know I'm replying to myself here. But I've got some more info.] [See arse end of email] > On Tuesday 08 June 2004 15:56, Allan Fields wrote: > > On Sun, Jun 06, 2004 at 07:40:15PM +1000, Alastair G. Hogge wrote: > > > For a couple of weeks now I've been having problems with my custom > > > kernel crashing the system. I've re-cvsup'd and nuked /usr/obj and > > > rebuild worlds > > > > > > The problem is that my kernel keeps causing ATA DMA READ/WRITE > > > errors and then eventually causing my RAID array to go down, thus > > > needing a deletation and re-definition thru the BIOS. Plus uncountable > > > fsck run thru. > > > > Yup, it sucks.. basically if your RAID goes bad, with most Promise > > controllers you need to reboot into BIOS and wait a long time for > > it to rebuild. I found the Promise BIOS a little lacking. I'm not > > a fan of oblique menu-based tools, especially when working w/ disks. > > > > Online rebuild is available on some ATA controllers but can also be > > slow. > > > > > I don't know how to capture and store the output. As the system just > > > basicly hangs and freezes the keyboard. Most of the time I've been X, > > > which can only be solved with a hard reboot. > > > > Also, just curious, but are you swapping off the RAID? > > Well not user if there's any swapping going on. I have 1024M of system > memory, and the swap partition is located on the array. > > > If your RAID has read/write errors and you use it for swap, it is > > likely that it will cause the system to lock, possibly including > > the console. > > > > Do you have a second machine to use as a serial console? > > Unfortunately not. I'm working on getting one setup thou. > > > Another thing to try: try pinging the host and see if it responds. > > Yes I can still ping the machine. > > > I use a null-modem cable and tip(1): When I was having problems w/ > > my Promise controller, I'd typically capture the output using > > script(1) or screen(1). > > Ahhh very handy. Thanks :-) > > > > Running a GENERIC kernel is (with debuging things removed) is so slow. > > > X/KDE performs so poorly now. > > > > What's interesting is why this only happens w/ your custom kernels. > > Actually, I think a GENERIC kernel just last longer then a custom. I left a > GENERIC running for 6+ hours the other day while I went out, when I came > back the system had locked up. > > > I've also experienced instability with Promise RAID controllers in > > the past but didn't ever use a GENERIC kernel. I'm interested in > > this issue, but don't know if it's related. > > > > Also: Perhaps your Promise controller or drives are overheating? > > Thought about this. But I don't think it is the case. I've had the 2 HD for > sometime now, and I they used to 24/7. I have 3 fans running in my tower > case. > > I've just re-built world again recently and I'm still getting problems. > > I need to get that other machine going. When the system goes now, well down into the kernel debugger. I can no longer ping the host. I've also been trying to use telnet on a WindowsXP box, but that hangs when the system goes down, or I can't connect. Anyways I wrote down the following: ad6: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=128 ad4: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=255 ad6: FAILURE - ATA_INENTIFY no interrupt ad6: FAILURE - ATA_INENTIFY no interrupt ar1073679450: unknow array type in ar_done ar1073679450: unknow array type in ar_done ar1073679450: unknow array type in ar_done ar1073679450: unknow array type in ar_done ar1073679450: unknow array type in ar_done ar1073679450: unknow array type in ar_done ar1073679450: unknow array type in ar_done ar1073679450: unknow array type in ar_done ad4: FAILURE - ATA_INENTIFY no interrupt ad4: FAILURE - ATA_INENTIFY no interrupt ar1073679450: unknow array type in ar_done ar1073679450: unknow array type in ar_done ar1073679450: unknow array type in ar_done ar1073679450: unknow array type in ar_done ar1073679450: unknow array type in ar_done ar1073679450: unknow array type in ar_done ar1073679450: unknow array type in ar_done ar1073679450: unknow array type in ar_done Fatal trap 12: page fault while in kernel mode fault virtual address = 0x3fff0c62 fault code = supervisor read, page not present instruction pointer = 0x8:0xc05782be stack pointer = 0x10:0xdb642b60 fame pointer = 0x10:0xdb642c84 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor flags = interrupt enabled, resume, iopl=0 current process = 28 (swi8: tty:sio clock) kernel: type 12 trap, code 0 stopped at cvmp+0x16: rope cmpsl (%esi),%es:(%edi) db> trace bcmp(c235b5800) at bcmp+0x16 in6_purgeaddr(c235b5800) at in6_purgeaddr+0x72 nd6_timer(0) at nd6_timer+0x272 softclock(0) at softclock+0176 ithread_loop(c272c480,db64248,c227a480,e04723b4,0) at ithread_loop+0x134 fork_exit(c04723b4, c227a480, db642d48) at fork_exit+0x98 fork_trampoline() at fork_trampoline+0x8 --- trap 0x1, eip = 0, esp = 0db642d7c, ebp = 0 ---Received on Thu Jun 10 2004 - 12:14:21 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:56 UTC