RE: hang with raid, postgresql

From: Don Bowman <don_at_sandvine.com>
Date: Mon, 31 May 2004 14:20:32 -0400
From: Doug White [mailto:dwhite_at_gumbysoft.com]
> On Sun, 30 May 2004, Don Bowman wrote:
> 
> > From: Doug White [mailto:dwhite_at_gumbysoft.com]
> > > On Sun, 30 May 2004, Don Bowman wrote:
> > >
> > > >
> > > > I have a system with 2x 2.8GHz XEON (P4), intel e7501 chipset,
> > > > 4GB of ram, aac [adaptec 2200s] raid with 4 scsi
> > > > disks. I have also tried asr (adaptec 2015).
> > > > I have tried two different motherboards.
> > > > The only application the machine runs is postgresql,
> > > > with about ~30 databases, about ~250GB of data.
> > > >
> > > > I'm finding the machine locks up solid once a day
> > > > or so (sometimes more, sometimes less, no pattern
> > > > of time of day). I know its not a hardware issue, it
> > > > is reliable with FreeBSD 4.7. I've run through memory
> > > > test, disk test, etc.
> > > >
> > > > There appears to be a correlation between
> > > > disk activity (postgresql vacuum) and the lockup,
> > > > but i can't be sure.
> > >
> > > Temperature?
> > >
> > > What motherboard is it exactly?
> >
> > lmmon shows the mobo temperature _at_ 28C. It is in
> > an AC-controlled environment (~20C ambient). The system
> > has 6 blower fans, ducted over the CPU's, with the
> > copper heat sinks designed for the 3.2GHz XEON.
> 
> alright so its a pretty beefy server chassis, although it 
> could also be an
> underperforming power supply or a scsi terminator.

it has 3 separate power supplies, all have been verified.
Its the 3rd piece of hardware i've tried.

> 
> > It has 3 power supplies, each with separate AC
> > inlet, fed from a UPS with filtered power.
> > It should have ~150% airflow redundancy, and
> > ~200% power redundancy.
> > This is a supermicro X5DPE motherboard.
> 
> Do you happen to have the IPMI option board for this system?

No IPMI.

> 
> Still seems hardware-related to me, although I've found hard 
> hangs caused
> by buggy optimization on amd64.

I don't think so. I extensively tested it with freebsd 4.7, memtest86.
The scsi bus was checked with a scope, and was checked with an
'ahd' controller so that we could see iuCRC errors, SCB time outs,
etc (ahd is excellent _at_ reporting errors, much better than
any other driver). Two disk tests were run (iozone as a benchmark,
iotest as a test) for several days.

I'm pretty sure this is a garden variety sw problem.
Currently i am suspicious of the acpi... this machine hangs
on boot if acpi is not enabled, so its hard to test that
theory :) The hang is in setting up and enumerating pnp isa
devices. I guess i could expend energy to figure that out.

My next step (which i'm not looking forward to) is to try
and solder the TAP connector on and hook up my emulator.
I really really don't want to do that.

--don
Received on Mon May 31 2004 - 09:20:56 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:55 UTC