Re: Machine hangs(Beta7), only reset button works

From: Robert Watson <rwatson_at_freebsd.org>
Date: Thu, 21 Oct 2004 07:46:05 -0400 (EDT)
On Thu, 21 Oct 2004, Tom Jensen wrote:

> I've been seeing a pretty strange problem lately with my server. 
> 
> The box completely freeze typically when it's done running the first
> part of my backup script, resulting in no possibility to login on the
> console or by SSH, the freeze even happens when I'm sitting in a
> terminal and working. 
> 
> There is no indication in log files etc. about what's causing the
> problem and it's not breaking into debugger either :-(

This should probably be in debugging lore somewhere, but I've observed
that it's often possible to break into the debugger using a break over
serial console when it's not possible to break in using syscons.  This is
because syscons requires the Giant lock, so if the freeze happens because
a thread is spinning while holding Giant, you can't get in.  This needs to
be fixed, but hasn't yet been fixed, so in the mean time often useful
advice is to use a serial console to generate the break.

If you still can't get into the debugger, you might try some of the
various watchdog drivers -- some hardware comes with built in watchdog
parts, such as ichwd(4), or you could try options MP_WATCHDOG on an SMP
box if you're willing to dedicate a CPU to running as a watchdog for the
other cpu(s).

Robert N M Watson             FreeBSD Core Team, TrustedBSD Projects
robert_at_fledge.watson.org      Principal Research Scientist, McAfee Research


> 
> The backup script is really simple, creating a .tgz file of a given
> directory, mounting a windows share (mount_smbfs) and copying the file. The
> script is run by cron six times (start at the same time) in six different
> directories, this results in the box freezes after the tar processes
> finishes.
> 
> Attached is the dmesg.boot and the latest top, don't know if it's any use
> but it's seems rather strange that a lot of processes are in a STATE usf
> (not sure what this means but I don't sees this when the box is running
> normally)
> 
> The kernel is mostly a generic with the following modifications:
> 
> options         IPFIREWALL
> options         IPFIREWALL_VERBOSE
> options         IPFIREWALL_VERBOSE_LIMIT=400
> options         IPDIVERT
> options         IPSEC
> options         IPSEC_ESP
> options         IPSEC_DEBUG
> device ath
> device ath_hal
> options         KDB
> options         DDB          
> 
> bash-2.05b# uname -a
> FreeBSD bart.motd.dk 5.3-BETA7 FreeBSD 5.3-BETA7 #6: Tue Oct 19 00:36:59
> CEST 2004     root_at_bart.motd.dk:/usr/obj/usr/src/sys/GW  i386
> 
> Any more info needed please let me know.
> 
> Best regards
> 
> - Tom
> 
> 
Received on Thu Oct 21 2004 - 09:46:25 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:18 UTC