Re: Custom kernels causing Promise ATA RAID to go down

From: Alastair G. Hogge <agh_at_tpg.com.au>
Date: Thu, 10 Jun 2004 14:28:48 +1000
On Tuesday 08 June 2004 15:56, Allan Fields wrote:
> On Sun, Jun 06, 2004 at 07:40:15PM +1000, Alastair G. Hogge wrote:
> > For a couple of weeks now I've been having problems with my custom kernel
> > crashing the system. I've re-cvsup'd and nuked /usr/obj and rebuild
> > worlds
> >
> > The problem is that my kernel keeps causing ATA DMA READ/WRITE
> > errors and then eventually causing my RAID array to go down, thus
> > needing a deletation and re-definition thru the BIOS. Plus uncountable
> > fsck run thru.
>
> Yup, it sucks.. basically if your RAID goes bad, with most Promise
> controllers you need to reboot into BIOS and wait a long time for
> it to rebuild.  I found the Promise BIOS a little lacking.  I'm not
> a fan of oblique menu-based tools, especially when working w/ disks.
>
> Online rebuild is available on some ATA controllers but can also be
> slow.
>
> > I don't know how to capture and store the output. As the system just
> > basicly hangs and freezes the keyboard. Most of the time I've been X,
> > which can only be solved with a hard reboot.
>
> Also, just curious, but are you swapping off the RAID?
Well not user if there's any swapping going on. I have 1024M of system memory, 
and the swap partition is located on the array.

> If your RAID has read/write errors and you use it for swap, it is
> likely that it will cause the system to lock, possibly including
> the console.
>
> Do you have a second machine to use as a serial console?
Unfortunately not. I'm working on getting one setup thou.

> Another thing to try: try pinging the host and see if it responds.
Yes I can still ping the machine.

> I use a null-modem cable and tip(1): When I was having problems w/
> my Promise controller, I'd typically capture the output using
> script(1) or screen(1).
Ahhh very handy. Thanks :-)

> > Running a GENERIC kernel is (with debuging things removed) is so slow. 
> > X/KDE performs so poorly now.
>
> What's interesting is why this only happens w/ your custom kernels.
Actually, I think a GENERIC kernel just last longer then a custom. I left a 
GENERIC running for 6+ hours the other day while I went out, when I came back 
the system had locked up.

> I've also experienced instability with Promise RAID controllers in
> the past but didn't ever use a GENERIC kernel.  I'm interested in
> this issue, but don't know if it's related.
>
> Also: Perhaps your Promise controller or drives are overheating?
Thought about this. But I don't think it is the case. I've had the 2 HD for 
sometime now, and I they used to 24/7. I have 3 fans running in my tower 
case.

I've just re-built world again recently and I'm still getting problems.

I need to get that other machine going.
Received on Thu Jun 10 2004 - 02:28:26 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:56 UTC