Re: FreeBSD 8.0-BETA2/amd64 crashes on SMP under load

From: Anton Shterenlikht <mexas_at_bristol.ac.uk>
Date: Thu, 30 Jul 2009 10:05:54 +0100
On Wed, Jul 29, 2009 at 06:34:15PM -0400, Alexandre Sunny Kovalenko wrote:
> On Tue, 2009-07-28 at 15:45 +0100, Anton Shterenlikht wrote:
> > On Tue, Jul 28, 2009 at 02:22:50PM +0000, O. Hartmann wrote:
> > > Anton Shterenlikht wrote:
> > > > On Mon, Jul 27, 2009 at 10:04:28PM +0100, Anton Shterenlikht wrote:
> > > >> On Mon, Jul 27, 2009 at 09:55:12PM +0200, O. Hartmann wrote:
> > > >>> Kamigishi Rei wrote:
> > > >>>> O. Hartmann wrote:
> > > >>>>> I have the problem of crashing FreeBSD 8.0-BETA2/amd64 under load on
> > > >>>>> all of our SMP boxes. Is there an issue known at the moment? If not, I
> > > >>>>> will prepare the kernel for whitnessing and provide more informations,
> > > >>>>> if you wish.
> > > >>>> A quick question: what is in the crash message, i.e. the backtrace?
> > > >>>> And what kind of crash is it - a panic() or a fatal trap?
> > > >>> On the 8-core server box, I sometimes see :
> > > >>>
> > > >>> Fatal trap 12: page fault while in kernel mode
> > > >>> fault code              = supervisor read, page not present
> > > >> Not sure if it's related, but on ia64 SMP (2 cpus) with 8.0-current and
> > > >> later with 8.0-beta1 (I havent' built beta2 yet) I'm getting crashes
> > > >> under load every so often. E.g buildworld -j8 is likely to crash the
> > > >> box. No messages, just a sudden freeze, no backtrace or panic, and then reboot.
> > > >>
> > > >> If load is less heavy, e.g. fewer processes and some idle time, the
> > > >> problem doesn't seem to appear.
> > > >>
> > > >> I'm happy to do any further testing, if suggested.
> > > > 
> > > > my ia64 8.0-beta1 SMP box died again on
> > > > make -j8 buildworld
> > > > with no panic or log entries.
> > > > 
> > > > Is it possible that some kernel variable needs to
> > > > be increased? E.g. kern.maxproc, kern.maxfiles, etc.
> > > > Or perhaps I'm talking complete rubbish..
> > > > 
> > > 
> > > I suggest you try again with a UP kernel - a suggestion from a 
> > > kernel-nnob, sorry. My SMP boxes work now with UP-kernel, but they are 
> > > really slowish although they have modern Intel C2D/Penryn cores.
> > 
> > I need SMP for OpenMP codes. It's a shame if SMP is buggy, but
> > I guess all is down to small user base..
> > 
> Before you go down that path, which, IMHO, is as counterproductive as it
> is incorrect, could you, please, show the output of 
> 
> sysctl debug | grep panic

> sysctl debug|grep panic
debug.ddb.textdump.do_panic: 1
debug.trace_on_panic: 1
debug.debugger_on_panic: 1
debug.kdb.panic: 0
>

> and check whether output of 
> 
> savecore -vC

# savecore -vC
unable to open bounds file, using 0
checking for kernel dump on device /dev/mirror/swap
mediasize = 2147483136
sectorsize = 512
magic mismatch on last dump header on /dev/mirror/swap
No dump exists
#

dumpdev wasn't configured..
I've configured it now, will try crash dump next time.

By the way, are these two FreeBSD docs up to date:

http://www.freebsd.org/doc/en_US.ISO8859-1/books/faq/advanced.html#KERNEL-PANIC-TROUBLESHOOTING

http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug.html

In particular, it is still true that minidump is a default dump type?
 
many thanks

-- 
Anton Shterenlikht
Room 2.6, Queen's Building
Mech Eng Dept
Bristol University
University Walk, Bristol BS8 1TR, UK
Tel: +44 (0)117 928 8233 
Fax: +44 (0)117 929 4423
Received on Thu Jul 30 2009 - 07:06:00 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:52 UTC