unable to debug - corrupt stack? - locking issue on 6.0

From: Antony Mawer <fbsd-current_at_mawer.org>
Date: Thu, 13 Oct 2005 11:13:00 +1000
Hi All,

Have been trying to track down the cause of what seems to be a locking 
problem with 6.0 (present on beta3/4/5 and rc1) with IPX... it 
manifested itself as flakey Netware connectivity, but appears to be a 
problem in the IPX locking. We were seeing calls to vn_lock() that got 
wedged and never returned, and enabling WITNESS resulted in this panic:

> [root_at_davegproxy] /usr/obj/usr/src/sys/GPROXY6$ kgdb kernel.debug /var/crash/vmcore.1
> [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"]
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "i386-marcel-freebsd".
> 
> Unread portion of the kernel message buffer:
> Sleeping in "ef_output" with the following non-sleepable locks held:
> exclusive sleep mutex ipx_mtx r = 0 (0xc186744c) locked _at_ /usr/src/sys/netipx/ipx_usrreq.c:595
> Sleeping in "ef_output" with the following non-sleepable locks held:
> exclusive sleep mutex ipx_mtx r = 0 (0xc186744c) locked _at_ /usr/src/sys/netipx/ipx_usrreq.c:595
> panic: userret: Returning with 1 locks held.
> Uptime: 58s
> Dumping 255 MB (3 chunks)
>   chunk 0: 1MB (159 pages) ... ok
>   chunk 1: 254MB (65008 pages) 238 222 206 190 174 158 142 126 110 94 78 62 46 30 14 ... ok
>   chunk 2: 1MB (256 pages)
> 
> #0  doadump () at pcpu.h:165
> 165             __asm __volatile("movl %%fs:0,%0" : "=r" (td));

Trying to get a stack trace from it yields:

> (kgdb) bt
> #0  doadump () at pcpu.h:165
> #1  0xc05d6e34 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:399
> #2  0xc05d70b2 in panic (fmt=0xc07c3174 "userret: Returning with %d locks held.") at /usr/src/sys/kern/kern_shutdown.c:555
> #3  0xc05f619b in userret (td=0xc1a0f480, frame=0xd142dd38, oticks=25) at /usr/src/sys/kern/subr_trap.c:136
> #4  0xc0767c2d in syscall (frame=
>       {tf_fs = 59, tf_es = 59, tf_ds = 59, tf_edi = 4, tf_esi = 673886912, tf_ebp = -1077942104, tf_isp = -784147100, tf_ebx = 673809636, tf_edx = 0, tf_ecx = 0, tf_eax = 3, tf_trapno = 12, tf_err = 2, tf_eip = 673286039, tf_cs = 51, tf_eflags = 66054, tf_esp = -1077942148, tf_ss = 59}) at /usr/src/sys/i386/i386/trap.c:1026
> #5  0xc07574bf in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:200
> #6  0x00000033 in ?? ()
> Previous frame inner to this frame (corrupt stack?)

Can anyone recommend where we go from here? The panic is trivially 
reproducable simply by enabling IPX with these lines in rc.conf:

# Ethernet 802.2
ifconfig_lnc0f2_ipx="ipx 0x2"
ipxrouted_enable="YES"

building a kernel with WITNESS enabled, and then booting. During bootup 
the system then panics... if you add WITNESS_SKIPSPIN then the problem 
doesn't appear to manifest itself until a Netware volume is mounted.

Any help on how to get a more useful stack trace or tackling the 
"Sleeping with non-sleepable locks held" errors would be greatly 
appreciated!!

Cheers
Antony
Received on Wed Oct 12 2005 - 23:12:59 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:45 UTC