INSTABILITY all around (Was: Re: STABILITY [Fwd: cvs commit: src/sys/kern kern_switch.c])

From: Brian Fundakowski Feldman <green_at_freebsd.org>
Date: Thu, 22 Jul 2004 11:39:13 -0400
On Thu, Jul 22, 2004 at 09:13:55AM -0600, Scott Long wrote:
> All,
> 
> This commit is another hack to try to improve stability some
> more.  Please let me know if it helps or hurts.  If it helps
> then I think that we are getting closer to at least one of the
> real culprits.

I am having VERY bad luck with -CURRENT now.  As of a few weeks ago,
the kernel was perfectly fine on UP and SMP i386.  Now, I have had to
switch PREEMPTION off on both to gain some stability back.  Turning on
debug.mpsafenet results in instareboots very quickly for my UP box,
but with debug.mpsafenet is off then instareboots take more like a day
to happen.

On SMP, I haven't tried debug.mpsafenet, but now I'm getting a hang
which is probably related to threading programs as it's when using a lot
of KDE and Mozilla Firefox and such.  Instareboots might be debuggable
with serial console, as might hangs, if they are symptomatic of the
actual panic() crashing or hanging.

I'm not holding my breath; breaking into DDB+KDB from serial console
whether in X or not causes the machine to just hang completely on SMP,
haven't tried on UP.  My machines are always using kern.sync_on_panic=0,
net.inet.tcp.sack.enable=1, INVARIANTS, WITNESS, WITNESS_SKIPSPIN,
and SCHED_ULE.  Now I have debug.debugger_on_panic=1 and both hooked
up to serial consoles.

On SMP and on UP I have significant modifications to kqueue to make
it mp-safe, which showed no evidence of instability hammering on them
a month ago.  On SMP I have UMA init/ctor-function error checking
modifications which should almost never do anything, as I am not
remotely running out of memory.  On both I have also overhauled the
VM wiring capabilities, and I have never tripped any of the many
KASSERT()s that I added while doing that.

I am willing to run specific stress tests to help pinpoint my issues,
but I'm not even thinking it's worth bothering to reenable the
PREEMPTION code when there are so many problems without it.  It is
of course possible that it's now broken in subtle ways to NOT use the
PREEMPTION code, but I don't know much about this.  I certainly
hope DDB+KDB on panic works where DDB+KDB on break does not, or I'll
probably get nowhere at all.

-- 
Brian Fundakowski Feldman                           \'[ FreeBSD ]''''''''''\
  <> green_at_FreeBSD.org                               \  The Power to Serve! \
 Opinions expressed are my own.                       \,,,,,,,,,,,,,,,,,,,,,,\
Received on Thu Jul 22 2004 - 13:39:15 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:02 UTC