MP watchdog (or: I have a dual-xeon with processors to burn)

From: Robert Watson <rwatson_at_FreeBSD.org>
Date: Sun, 15 Aug 2004 15:46:18 -0400 (EDT)
I've just committed a hack I've been using over the last day or two to
debug hangs.  It's hardly perfect, but it is sort of neat.  Basically, it
allows you to allocate a CPU on an SMP system as a watchdog to kick you
into the debugger if there's a hang, even if it's spinning in sched_lock
or the like.  It can either fire an NMI at the boot processor, or invoke
the debugger directly.  I've included a sample "be nasty" sysctl that
attempts to cause a nasty hang which the debugger is capable of breaking
into.  Note that the current SMP hang I'm experiencing resists this
technique, but it's a useful one regardless, and is a decent substitute
for having an NMI button.  And it's a useful use for that fourth logical
processor on a dual Xeon... :-)

You can add MP_WATCHDOG to your i386 conf file, select SCHED_4BSD as the
scheduler, and use the debug.watchdog sysctl to set a debugging CPU (I'll
usually set it to 3 on my box).  In ps(1) you'll see the idle thread on
that CPU rename to a watchdog thread.  Due to interrupt round-robining and
some IPI's, there will be situations where the watchdog CPU does other
things than watch, but it seems to do that in few enough situations that
this is useful for a broad range of debugging.  Obviously, you lose
utilization of the CPU for the duration of having the watchdog enabled.

Note: This does not work with sched_ule, only sched_4bsd.  I'll work on
fixing that at some point, but I'm still chasing the current stability
problems.

Robert N M Watson             FreeBSD Core Team, TrustedBSD Projects
robert_at_fledge.watson.org      Principal Research Scientist, McAfee Research

---------- Forwarded message ----------
Date: Sun, 15 Aug 2004 18:02:10 +0000 (UTC)
From: Robert Watson <rwatson_at_FreeBSD.org>
To: src-committers_at_FreeBSD.org, cvs-src_at_FreeBSD.org, cvs-all_at_FreeBSD.org
Subject: cvs commit: src/sys/conf files.i386 options.i386 src/sys/i386/i386         mp_machdep.c mp_watchdog.c src/sys/i386/include mp_watchdog.h

rwatson     2004-08-15 18:02:10 UTC

  FreeBSD src repository

  Modified files:
    sys/conf             files.i386 options.i386 
    sys/i386/i386        mp_machdep.c 
  Added files:
    sys/i386/i386        mp_watchdog.c 
    sys/i386/include     mp_watchdog.h 
  Log:
  Add an "options MP_WATCHDOG" to i386.  This option allows one of the
  logical CPUs on a system to be used as a dedicated watchdog to cause a
  drop to the debugger and/or generate an NMI to the boot processor if
  the kernel ceases to respond.  A sysctl enables the watchdog running
  out of the processor's idle thread; a callout is launched to reset a
  timer in the watchdog.  If the callout fails to reset the timer for ten
  seconds, the watchdog will fire.  The sysctl allows you to select which
  CPU will run the watchdog.
  
  A sample "debug.leak_schedlock" is included, which causes a sysctl to
  spin holding sched_lock in order to trigger the watchdog.  On my Xeons,
  the watchdog is able to detect this failure mode and break into the
  debugger, which cannot otherwise be done without an NMI button.
  
  This option does not currently work with sched_ule due to ule's push
  notion of scheduling, similar to machdep.hlt_logical_cpus failing to
  work with that scheduler.
  
  On face value, this might seem somewhat inefficient, but there are a
  lot of dual-processor Xeons with HTT around, so using one as a watchdog
  for testing is not as inefficient as one might fear.
  
  Revision  Changes    Path
  1.503     +1 -0      src/sys/conf/files.i386
  1.213     +1 -0      src/sys/conf/options.i386
  1.234     +9 -0      src/sys/i386/i386/mp_machdep.c
  1.1       +225 -0    src/sys/i386/i386/mp_watchdog.c (new)
  1.1       +34 -0     src/sys/i386/include/mp_watchdog.h (new)
Received on Sun Aug 15 2004 - 17:48:06 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:06 UTC