Re: Random panics with 5.3-REL, SMP

From: Doug White <dwhite_at_gumbysoft.com>
Date: Mon, 6 Dec 2004 10:33:48 -0800 (PST)
On Sun, 5 Dec 2004, Hogan Whittall wrote:

> Here's the output of the panic.  It hung again while dumping core...
>
> Fatal trap 12: page fault while in kernel mode
>
> cpuid = 3; apic id = 02
> fault virtual address	= 0xa748d260

Hm, that is sure out in space.  Its certainly a bit-flip away from
legitimate regions, so I think the bad RAM theory is the more likely one
at this point.  The 6450 should be using ECC memory though so you might
make sure its enabled and figure out a way of interrogating the ESM bits
for any messages (It'll log ECC correction events).

Could also be a bad processor.

> fault code		= supervisor write, page not present
> instruction pointer	= 0x8:0xc060b611
> stack pointer		= 0x10:0xe971bc20
> frame pointer		= 0x10:0xe971bc20
> code segment		= base 0x0, limit 0xfffff, type 0x1b
> 			= DPL 0, pres 1, def32 1, gran 1
> processor eflags	= interrupt enabled, resume, IOPL = 0
> current process		= 52 (swi6: task queue)
> trap number		= 12
> panic: page fault
> cpuid = 3
> boot() called on cpu#3
> Uptime: 5d6h6m34s
> Dumping 3967 MB
>  16 32 48 64 80 96 112 128 144 160<hang>
>
> Whenever it panics it will hang at a random point during the dump, this
> last time was the farthest it got...
>
> Someone seems to suspect bad ram as the culprit.  I can always swap
> it out and see what happens.
>
> 	-Hogan
>
> On Wed, Nov 24, 2004 at 11:24:43PM -0800, Doug White wrote:
> > On Tue, 23 Nov 2004, Hogan Whittall wrote:
> >
> > > Dell PE6450 server, 4xP3-700 Xeon, 4gb ram, system disks reside on
> > > a 2 disk RAID1 attached to a MegaRAID controller.  Wireless is a
> > > D-Link DWL-G520-B, also has Intel Pro/100 ethernet.
> >
> > For the record, I have a PE6300 (4x500MHz) I'm testing a fix for panics of
> > the form
> >
> > panic: Previous IPI is stuck
> >
> > if thats one of the "random panics" you're seeing.  The thread "number of
> > CPUs and IPI panic" in -current has details, and a temporary patch is
> > available from
> >
> > http://people.freebsd.org/~ups/ipi4_patch
> >
> > I believe this patch will apply to both RELENG_5 and -CURRENT, for now.
> >
> > I was having a problem with random resets (no panic, just reset) and
> > traced it to a broken cpu0.  Replacing it made the resets go away.
> >
> > If you have room you may want to install RedHat or Windows on another
> > partition, load up the OpenManage tools, and inspect the system event log
> > for anything peculiar.  If you've never run this before then you may need
> > to clear the event log for it to log any new events. If you have a DRAC
> > card in the machine you can use that too.
> >
> > --
> > Doug White                    |  FreeBSD: The Power to Serve
> > dwhite_at_gumbysoft.com          |  www.FreeBSD.org
>

-- 
Doug White                    |  FreeBSD: The Power to Serve
dwhite_at_gumbysoft.com          |  www.FreeBSD.org
Received on Mon Dec 06 2004 - 17:33:49 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:24 UTC