Re: hard deadlock(?) on -current; some debugging info, need help

From: Peter Jeremy <PeterJeremy_at_optushome.com.au> Date: Thu, 26 May 2005 18:09:28 +1000 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:35 UTC

On Wed, 2005-May-25 17:18:06 -0700, Ted Faber wrote:
>The system slowly grinds to a halt, and the lockup seems to invlove the
>disk system.

Nothing is waiting on physical I/O, but there are lots of locked vnodes.
I notice there's a sh(? - pid 10715) blocked on nfsreq.  Can you reproduce
the problem without the NFS mounted filesystems?

>  I have not found a sequence that triggers them (other than
>trying to write mail to the list to report them), and I know how
>difficult that makes things.  It is common to have 2-5 a day.  Even when

>I can get to the debugger during a lockup, I cannot generate a crash
>dump - the kernel reports starting the dump and moves no bytes.

Not nice.   That suggests something below the filesystem is sick
because a filesystem deadlock won't affect the crashdump.

>I've attached a dmesg from a -v boot and the kernel config (the dmesg is
>not from the lockup run).  Last friday when the system locked I had a
>digital camera with me and took pictures of the ps output in the hopes
>that someone could look at them.  These images are at 
>
>http://www.isi.edu/~faber/tmp/deadlock/DSCN04{75,76,77,78,79,80,81,82}.JPG

The other information we need is "show lockedvnods".  This will hopefully
point to the process that started the problem.

-- 
Peter Jeremy