Re: unable to debug - corrupt stack? - locking issue on 6.0

From: Don Lewis <truckman_at_FreeBSD.org>
Date: Wed, 12 Oct 2005 18:54:34 -0700 (PDT)
On 13 Oct, Antony Mawer wrote:
> Hi All,
> 
> Have been trying to track down the cause of what seems to be a locking 
> problem with 6.0 (present on beta3/4/5 and rc1) with IPX... it 
> manifested itself as flakey Netware connectivity, but appears to be a 
> problem in the IPX locking. We were seeing calls to vn_lock() that got 
> wedged and never returned, and enabling WITNESS resulted in this panic:
> 
>> [root_at_davegproxy] /usr/obj/usr/src/sys/GPROXY6$ kgdb kernel.debug /var/crash/vmcore.1
>> [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"]
>> GNU gdb 6.1.1 [FreeBSD]
>> Copyright 2004 Free Software Foundation, Inc.
>> GDB is free software, covered by the GNU General Public License, and you are
>> welcome to change it and/or distribute copies of it under certain conditions.
>> Type "show copying" to see the conditions.
>> There is absolutely no warranty for GDB.  Type "show warranty" for details.
>> This GDB was configured as "i386-marcel-freebsd".
>> 
>> Unread portion of the kernel message buffer:
>> Sleeping in "ef_output" with the following non-sleepable locks held:
>> exclusive sleep mutex ipx_mtx r = 0 (0xc186744c) locked _at_ /usr/src/sys/netipx/ipx_usrreq.c:595
>> Sleeping in "ef_output" with the following non-sleepable locks held:
>> exclusive sleep mutex ipx_mtx r = 0 (0xc186744c) locked _at_ /usr/src/sys/netipx/ipx_usrreq.c:595
>> panic: userret: Returning with 1 locks held.
>> Uptime: 58s
>> Dumping 255 MB (3 chunks)
>>   chunk 0: 1MB (159 pages) ... ok
>>   chunk 1: 254MB (65008 pages) 238 222 206 190 174 158 142 126 110 94 78 62 46 30 14 ... ok
>>   chunk 2: 1MB (256 pages)
>> 
>> #0  doadump () at pcpu.h:165
>> 165             __asm __volatile("movl %%fs:0,%0" : "=r" (td));
> 
> Trying to get a stack trace from it yields:
> 
>> (kgdb) bt
>> #0  doadump () at pcpu.h:165
>> #1  0xc05d6e34 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:399
>> #2  0xc05d70b2 in panic (fmt=0xc07c3174 "userret: Returning with %d locks held.") at /usr/src/sys/kern/kern_shutdown.c:555
>> #3  0xc05f619b in userret (td=0xc1a0f480, frame=0xd142dd38, oticks=25) at /usr/src/sys/kern/subr_trap.c:136
>> #4  0xc0767c2d in syscall (frame=
>>       {tf_fs = 59, tf_es = 59, tf_ds = 59, tf_edi = 4, tf_esi = 673886912, tf_ebp = -1077942104, tf_isp = -784147100, tf_ebx = 673809636, tf_edx = 0, tf_ecx = 0, tf_eax = 3, tf_trapno = 12, tf_err = 2, tf_eip = 673286039, tf_cs = 51, tf_eflags = 66054, tf_esp = -1077942148, tf_ss = 59}) at /usr/src/sys/i386/i386/trap.c:1026
>> #5  0xc07574bf in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:200
>> #6  0x00000033 in ?? ()
>> Previous frame inner to this frame (corrupt stack?)
> 
> Can anyone recommend where we go from here? The panic is trivially 
> reproducable simply by enabling IPX with these lines in rc.conf:
> 
> # Ethernet 802.2
> ifconfig_lnc0f2_ipx="ipx 0x2"
> ipxrouted_enable="YES"
> 
> building a kernel with WITNESS enabled, and then booting. During bootup 
> the system then panics... if you add WITNESS_SKIPSPIN then the problem 
> doesn't appear to manifest itself until a Netware volume is mounted.
> 
> Any help on how to get a more useful stack trace or tackling the 
> "Sleeping with non-sleepable locks held" errors would be greatly 
> appreciated!!

Lucky you!  I've been trying to reproduce the vnode locking problem for
several days without success.  There has been one other report of this
problem, but IPX does not seem to be involved, so you are probably
experiencing two different problems.

The panic is caused by INVARIANTS detecting the leaked vnode lock.  Can
you add the DEBUG_LOCKS and DEBUG_VFS_LOCKS kernel options to see if
that catches the problem sooner?  The DEBUG_LOCKS option will store in
the vnode the location in the code where the vnode was last locked.

What sort of hardware is involved, SMP/UP, amount of memory, disks?  Any
unusual system tuneables or sysctls set?  What is running on the system
when this panic occurs.
Received on Wed Oct 12 2005 - 23:54:43 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:45 UTC