Re: 6.0 hangs (while building OOo)

From: Kris Kennaway <kris_at_obsecurity.org>
Date: Thu, 6 Oct 2005 01:38:53 -0400
On Tue, Oct 04, 2005 at 07:20:13PM -0700, Don Lewis wrote:
> On  4 Oct, Mikhail Teterin wrote:
> > ???????? 04 ??????? 2005 13:08, Don Lewis ?? ????????:
> >> Hung trying to lock a vnode ...
> >>
> >> What other processes are in the D state, and what is their wchan info?
> > 
> > mi_at_roo:~ (301) ps -lax | awk 'match($10, "D")'
> >     0     2     0   0  -8  0     0     8 -      DL    ??    0:06,50 [g_event]
> >     0     3     0   0  -8  0     0     8 -      DL    ??    0:39,71 [g_up]
> >     0     4     0   0  -8  0     0     8 -      DL    ??    0:31,21 [g_down]
> >     0     5     0   0   8  0     0     8 -      DL    ??    0:00,00 [thread taskq]
> >     0     6     0   0   8  0     0     8 -      DL    ??    0:00,00 [kqueue taskq]
> >     0     7     0   0  96  0     0     8 idle   DL    ??    0:00,00 [aic_recovery0]
> >     0     8     0   0  96  0     0     8 idle   DL    ??    0:00,00 [aic_recovery0]
> >     0     9     0   0  96  0     0     8 idle   DL    ??    0:00,00 [aic_recovery1]
> >     0    10     0   0 -16  0     0     8 ktrace DL    ??    0:00,00 [ktrace]
> >     0    39     0   0 -16  0     0     8 -      DL    ??    0:09,21 [yarrow]
> >     0    44     0   0   8  0     0     8 usbevt DL    ??    0:00,01 [usb0]
> >     0    45     0   0   8  0     0     8 usbtsk DL    ??    0:00,00 [usbtask]
> >     0    46     0   0  96  0     0     8 idle   DL    ??    0:00,00 [aic_recovery1]
> >     0    47     0   0  -8  0     0     8 -      DL    ??    0:00,91 [fdc0]
> >     0    49     0   0 -16  0     0     8 psleep DL    ??    0:03,51 [pagedaemon]
> >     0    50     0   0  20  0     0     8 psleep DL    ??    0:00,00 [vmdaemon]
> >     0    51     0   0 171  0     0     8 pgzero DL    ??   12:19,32 [pagezero]
> >     0    52     0   0 -16  0     0     8 psleep DL    ??    0:06,55 [bufdaemon]
> >     0    53     0   0  20  0     0     8 syncer DL    ??    1:00,40 [syncer]
> >     0    54     0   0  -4  0     0     8 vlruwt DL    ??    0:03,16 [vnlru]
> >     0    55     0   0 -64  0     0     8 -      DL    ??    0:11,48 [schedcpu]
> >     0   115     0   0  -8  0     0     8 mdwait DL    ??    0:05,75 [md7]
> >     0 45773 45771   0  -4  0  1740  1208 ufs    D     p1    0:00,32 dmake
> >     0 45806 45788 350  -4  0  1548   632 ufs    D     p1    0:00,00 /bin/tcsh -fc zipdep.pl -u -j  ../../../
> >     0 65072 64985 271  -4  0  1248   480 ufs    D     p1    0:00,00 /bin/tcsh -fc if ( -e ../../../unxfbsd.p
> >     0 65327  8694   0  -4  0  1432   908 ufs    D+    p2    0:02,05 find work/ -name provider.o
> 
> Mikhail and I have been looking at this offline and have discovered the
> following:
> 	The wedged processes are waiting for vnode locks in the file
>         name lookup path for the access() and lstat syscalls().
> 
> 	There are two locked directories that are wedging these
>         processes.
> 
> 	We don't know what threads are holding the locks on these
>         directories, but we do know that is is none of the threads
>         associated with these processes, so it is not a classic deadlock
>         problem.

'show lockedvnods' doesn't help?

There is code in -current that saves stack traces when lockmgr locks
are acquired, when DEBUG_LOCKS is enabled - except it sometimes panics
while trying to save the trace because of a code bug.  I remind jeffr
about this on a more-or-less daily basis, but he hasn't had time to
commit the fix he has yet.  It still may be useful if this is easily
reproducible.

> This problem appears to be some sort of vnode lock leak.

leaked lockmgr locks usually panic when the thread exits.

Kris
Received on Thu Oct 06 2005 - 03:38:55 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:44 UTC