Re: panic: _mtx_lock_sleep: recursed on non-recursive mutex process lock @ /usr/src/sys/kern/kern_exit.c:912

From: Kostik Belousov <kostikbel_at_gmail.com>
Date: Sun, 10 Apr 2011 18:18:47 +0300
On Sun, Apr 10, 2011 at 03:37:59PM +0200, Fabian Keil wrote:
> The following panic seems to be reliably reproducible with sources
> from yesterday (and today) by running claws-mail in X, switching to
> the console and trying to attach gdb to it with 'gdb -p $(pgrep claws-mail)'.
> 
> gdb somehow fails to attach, and after leaving gdb the panic occurs.
> 
> fk_at_r500 /usr/crash $cat info.0 
> Dump header from device /dev/ada0s1b
>   Architecture: amd64           
>   Architecture Version: 2       
>   Dump Length: 347676672B (331 MB)
>   Blocksize: 512                
>   Dumptime: Sat Apr  9 22:48:15 2011
>   Hostname: r500.local          
>   Magic: FreeBSD Kernel Dump    
>   Version String: FreeBSD 9.0-CURRENT #284 r+1c500d9: Sat Apr  9 15:44:51 CEST 2011
>     fk_at_r500.local:/usr/obj/usr/src/sys/ZOEY
>   Panic String: _mtx_lock_sleep: recursed on non-recursive mutex process lock _at_ /usr/src/sys/kern/kern_exit.c:912
> 
>   Dump Parity: 1557143341
>   Bounds: 0
>   Dump Status: good
> 
> fk_at_r500 /usr/crash $kgdb /boot/kernel/kernel vmcore.0 
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "amd64-marcel-freebsd"...
> 
> Unread portion of the kernel message buffer:
> panic: _mtx_lock_sleep: recursed on non-recursive mutex process lock _at_ /usr/src/sys/kern/kern_exit.c:912
> 
> cpuid = 0
> KDB: enter: panic
> panic: from debugger
> cpuid = 0
> Uptime: 3m48s
> Physical memory: 1974 MB
> Dumping 331 MB: 316 300 284 268 252 236 220 204 188 172 156 140 124 108 92 76 60 44 28 12
> 
> Reading symbols from /boot/kernel/zfs.ko...Reading symbols from /boot/kernel/zfs.ko.symbols...done.
> done.
> [...]
> Loaded symbols for /boot/kernel/drm.ko
> #0  doadump () at /usr/src/sys/kern/kern_shutdown.c:250
> 250             if (textdump_pending)
> (kgdb) where
> #0  doadump () at /usr/src/sys/kern/kern_shutdown.c:250
> #1  0xffffffff80531fb7 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:418
> #2  0xffffffff80531a51 in panic (fmt=Variable "fmt" is not available.
> ) at /usr/src/sys/kern/kern_shutdown.c:591
> #3  0xffffffff803202b7 in db_panic (addr=Variable "addr" is not available.
> ) at /usr/src/sys/ddb/db_command.c:478
> #4  0xffffffff80320761 in db_command (last_cmdp=0xffffffff80c9fb60, cmd_table=Variable "cmd_table" is not available.
> ) at /usr/src/sys/ddb/db_command.c:445
> #5  0xffffffff803209b0 in db_command_loop () at /usr/src/sys/ddb/db_command.c:498
> #6  0xffffffff80322a99 in db_trap (type=Variable "type" is not available.
> ) at /usr/src/sys/ddb/db_main.c:229
> #7  0xffffffff80567ba1 in kdb_trap (type=3, code=0, tf=0xffffff8000017740) at /usr/src/sys/kern/subr_kdb.c:533
> #8  0xffffffff807cee06 in trap (frame=0xffffff8000017740) at /usr/src/sys/amd64/amd64/trap.c:590
> #9  0xffffffff807b9303 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:228
> #10 0xffffffff805679fb in kdb_enter (why=0xffffffff80955faf "panic", msg=0xa <Address 0xa out of bounds>) at cpufunc.h:63
> #11 0xffffffff80531a60 in panic (fmt=Variable "fmt" is not available.
> ) at /usr/src/sys/kern/kern_shutdown.c:574
> #12 0xffffffff805222f9 in _mtx_lock_sleep (m=Variable "m" is not available.
> ) at /usr/src/sys/kern/kern_mutex.c:341
> #13 0xffffffff805223a5 in _mtx_lock_flags (m=Variable "m" is not available.
> ) at /usr/src/sys/kern/kern_mutex.c:203
> #14 0xffffffff80503789 in proc_reparent (child=0xfffffe001290a000, parent=0xfffffe0012e48900) at /usr/src/sys/kern/kern_exit.c:912
> #15 0xffffffff80503ed8 in kern_wait (td=0xffffff8000017adc, pid=Variable "pid" is not available.
> ) at /usr/src/sys/kern/kern_exit.c:708
> #16 0xffffffff805043e5 in wait4 (td=Variable "td" is not available.
> ) at /usr/src/sys/kern/kern_exit.c:653
> #17 0xffffffff805751ab in syscallenter (td=0xfffffe00028838c0, sa=0xffffff8000017bb0) at /usr/src/sys/kern/subr_trap.c:344
> #18 0xffffffff807ce8bc in syscall (frame=0xffffff8000017c50) at /usr/src/sys/amd64/amd64/trap.c:910
> #19 0xffffffff807b95cd in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:384
> #20 0x000000000040cb8c in ?? ()
> Previous frame inner to this frame (corrupt stack?)
> (kgdb) f 14
> #14 0xffffffff80503789 in proc_reparent (child=0xfffffe001290a000, parent=0xfffffe0012e48900) at /usr/src/sys/kern/kern_exit.c:912
> 912             PROC_LOCK(parent);
> (kgdb) p *child
> $2 = {p_list = {le_next = 0x0, le_prev = 0xffffffff80cba328}, p_threads = {tqh_first = 0xfffffe001287c000, tqh_last = 0xfffffe001287c010}, p_slock = {lock_object = {
>       lo_name = 0xffffffff80954527 "process slock", lo_flags = 720896, lo_data = 0, lo_witness = 0x0}, mtx_lock = 4}, p_ucred = 0xfffffe00427b5a00, p_fd = 0x0, 
>   p_fdtol = 0x0, p_stats = 0xfffffe0005d82800, p_limit = 0x0, p_limco = {c_links = {sle = {sle_next = 0x0}, tqe = {tqe_next = 0x0, tqe_prev = 0x0}}, c_time = 0, c_arg = 0x0, 
>     c_func = 0, c_lock = 0xfffffe001290a0f8, c_flags = 0, c_cpu = 0}, p_sigacts = 0xfffffe0042634000, p_flag = 268460034, p_state = PRS_ZOMBIE, p_pid = 3515, p_hash = {
>     le_next = 0x0, le_prev = 0xffffff8000222dd8}, p_pglist = {le_next = 0x0, le_prev = 0xfffffe004a876790}, p_pptr = 0xfffffe0002881900, p_sibling = {
>     le_next = 0xfffffe0042640900, le_prev = 0xfffffe00028819f0}, p_children = {lh_first = 0x0}, p_mtx = {lock_object = {lo_name = 0xffffffff8095451a "process lock", 
>       lo_flags = 21168128, lo_data = 0, lo_witness = 0x0}, mtx_lock = 18446741874728777920}, p_ksi = 0xfffffe0005c60150, p_sigqueue = {sq_signals = {__bits = {0, 0, 0, 0}}, 
>     sq_kill = {__bits = {0, 0, 0, 0}}, sq_list = {tqh_first = 0x0, tqh_last = 0xfffffe001290a140}, sq_proc = 0xfffffe001290a000, sq_flags = 1}, p_oppid = 0, 
>   p_vmspace = 0xffffffff80cb7760, p_swtick = 21105, p_realtimer = {it_interval = {tv_sec = 0, tv_usec = 0}, it_value = {tv_sec = 0, tv_usec = 0}}, p_ru = {ru_utime = {
>       tv_sec = 0, tv_usec = 0}, ru_stime = {tv_sec = 0, tv_usec = 0}, ru_maxrss = 21044, ru_ixrss = 544640, ru_idrss = 334244, ru_isrss = 27264, ru_minflt = 2745, 
>     ru_majflt = 2158, ru_nswap = 0, ru_inblock = 3, ru_oublock = 0, ru_msgsnd = 134, ru_msgrcv = 103, ru_nsignals = 0, ru_nvcsw = 473, ru_nivcsw = 533}, p_rux = {
>     rux_runtime = 2346291660, rux_uticks = 119, rux_sticks = 29, rux_iticks = 0, rux_uu = 948699, rux_su = 223223, rux_tu = 1171923}, p_crux = {rux_runtime = 0, 
>     rux_uticks = 0, rux_sticks = 0, rux_iticks = 0, rux_uu = 0, rux_su = 0, rux_tu = 0}, p_profthreads = 0, p_exitthreads = 0, p_traceflag = 0, p_tracevp = 0x0, 
>   p_tracecred = 0x0, p_textvp = 0x0, p_lock = 0, p_sigiolst = {slh_first = 0x0}, p_sigparent = 20, p_sig = 0, p_code = 0, p_stops = 0, p_stype = 0, p_step = 0 '\0', 
>   p_pfsflags = 0 '\0', p_nlminfo = 0x0, p_aioinfo = 0x0, p_singlethread = 0x0, p_suspcount = 0, p_xthread = 0xfffffe001287c000, p_boundary_count = 0, p_pendingcnt = 0, 
>   p_itimers = 0x0, p_magic = 3203398350, p_osrel = 900034, p_comm = "claws-mail\000\000\000\000\000\000\000\000\000", p_pgrp = 0xfffffe004a876780, 
>   p_sysent = 0xffffffff80c6b320, p_args = 0xfffffe0005c8b3c0, p_cpulimit = 9223372036854775807, p_nice = 0 '\0', p_fibnum = 0, p_xstat = 9, p_klist = {kl_list = {
>       slh_first = 0x0}, kl_lock = 0, kl_unlock = 0, kl_assert_locked = 0xffffffff804fd960 <knlist_mtx_assert_locked>, 
>     kl_assert_unlocked = 0xffffffff804fd940 <knlist_mtx_assert_unlocked>, kl_lockarg = 0x0}, p_numthreads = 1, p_md = {md_ldt = 0x0, md_ldt_sd = {sd_lolimit = 0, 
>       sd_lobase = 0, sd_type = 0, sd_dpl = 0, sd_p = 0, sd_hilimit = 0, sd_xx0 = 0, sd_gran = 0, sd_hibase = 0, sd_xx1 = 0, sd_mbz = 0, sd_xx2 = 0}}, p_itcallout = {
>     c_links = {sle = {sle_next = 0x0}, tqe = {tqe_next = 0x0, tqe_prev = 0x0}}, c_time = 0, c_arg = 0x0, c_func = 0, c_lock = 0x0, c_flags = 16, c_cpu = 0}, p_acflag = 16, 
>   p_peers = 0x0, p_leader = 0xfffffe001290a000, p_emuldata = 0x0, p_label = 0x0, p_sched = 0xfffffe001290a480, p_ktr = {stqh_first = 0x0, stqh_last = 0xfffffe001290a430}, 
>   p_mqnotifier = {lh_first = 0x0}, p_dtrace = 0xfffffe0005c7a740, p_pwait = {cv_description = 0xffffffff80954fbe "ppwait", cv_waiters = 0}, p_dbgwait = {
>     cv_description = 0xffffffff80954fc5 "dbgwait", cv_waiters = 0}, p_prev_runtime = 0, p_racct = 0x0}
> (kgdb) p *parent
> $3 = {p_list = {le_next = 0xfffffe004261b000, le_prev = 0xffffffff80cba320}, p_threads = {tqh_first = 0xfffffe0005e0d8c0, tqh_last = 0xfffffe0005e0d8d0}, p_slock = {
>     lock_object = {lo_name = 0xffffffff80954527 "process slock", lo_flags = 720896, lo_data = 0, lo_witness = 0x0}, mtx_lock = 4}, p_ucred = 0xfffffe00427b5a00, 
>   p_fd = 0xfffffe00128b1c00, p_fdtol = 0x0, p_stats = 0xfffffe001290c800, p_limit = 0xfffffe00427b5300, p_limco = {c_links = {sle = {sle_next = 0x0}, tqe = {tqe_next = 0x0, 
>         tqe_prev = 0x0}}, c_time = 0, c_arg = 0x0, c_func = 0, c_lock = 0xfffffe0012e489f8, c_flags = 0, c_cpu = 0}, p_sigacts = 0xfffffe0012e9c000, p_flag = 268451842, 
>   p_state = PRS_NORMAL, p_pid = 3407, p_hash = {le_next = 0x0, le_prev = 0xffffff8000222a78}, p_pglist = {le_next = 0x0, le_prev = 0xfffffe0005e59310}, 
>   p_pptr = 0xfffffe004261b000, p_sibling = {le_next = 0x0, le_prev = 0xfffffe004261b0f0}, p_children = {lh_first = 0x0}, p_mtx = {lock_object = {
>       lo_name = 0xffffffff8095451a "process lock", lo_flags = 21168128, lo_data = 0, lo_witness = 0x0}, mtx_lock = 18446741874728777920}, p_ksi = 0xfffffe0005c5eb60, 
>   p_sigqueue = {sq_signals = {__bits = {0, 0, 0, 0}}, sq_kill = {__bits = {0, 0, 0, 0}}, sq_list = {tqh_first = 0x0, tqh_last = 0xfffffe0012e48a40}, 
>     sq_proc = 0xfffffe0012e48900, sq_flags = 1}, p_oppid = 0, p_vmspace = 0xfffffe00128c5310, p_swtick = 12384, p_realtimer = {it_interval = {tv_sec = 0, tv_usec = 0}, 
>     it_value = {tv_sec = 0, tv_usec = 0}}, p_ru = {ru_utime = {tv_sec = 0, tv_usec = 0}, ru_stime = {tv_sec = 0, tv_usec = 0}, ru_maxrss = 0, ru_ixrss = 0, ru_idrss = 0, 
>     ru_isrss = 0, ru_minflt = 0, ru_majflt = 0, ru_nswap = 0, ru_inblock = 0, ru_oublock = 0, ru_msgsnd = 0, ru_msgrcv = 0, ru_nsignals = 0, ru_nvcsw = 0, ru_nivcsw = 0}, 
>   p_rux = {rux_runtime = 1633285860, rux_uticks = 63, rux_sticks = 44, rux_iticks = 0, rux_uu = 482021, rux_su = 336649, rux_tu = 818671}, p_crux = {rux_runtime = 770335510, 
>     rux_uticks = 11, rux_sticks = 39, rux_iticks = 0, rux_uu = 84947, rux_su = 301176, rux_tu = 386124}, p_profthreads = 0, p_exitthreads = 0, p_traceflag = 0, 
>   p_tracevp = 0x0, p_tracecred = 0x0, p_textvp = 0xfffffe00427ae780, p_lock = 0, p_sigiolst = {slh_first = 0x0}, p_sigparent = 20, p_sig = 0, p_code = 0, p_stops = 0, 
>   p_stype = 0, p_step = 0 '\0', p_pfsflags = 0 '\0', p_nlminfo = 0x0, p_aioinfo = 0x0, p_singlethread = 0x0, p_suspcount = 0, p_xthread = 0x0, p_boundary_count = 0, 
>   p_pendingcnt = 0, p_itimers = 0x0, p_magic = 3203398350, p_osrel = 900025, p_comm = "bash", '\0' <repeats 15 times>, p_pgrp = 0xfffffe0005e59300, 
>   p_sysent = 0xffffffff80c6b320, p_args = 0xfffffe0002ba2880, p_cpulimit = 9223372036854775807, p_nice = 0 '\0', p_fibnum = 0, p_xstat = 0, p_klist = {kl_list = {
>       slh_first = 0x0}, kl_lock = 0xffffffff804fd660 <knlist_mtx_lock>, kl_unlock = 0xffffffff804fd680 <knlist_mtx_unlock>, 
>     kl_assert_locked = 0xffffffff804fd960 <knlist_mtx_assert_locked>, kl_assert_unlocked = 0xffffffff804fd940 <knlist_mtx_assert_unlocked>, kl_lockarg = 0xfffffe0012e489f8}, 
>   p_numthreads = 1, p_md = {md_ldt = 0x0, md_ldt_sd = {sd_lolimit = 0, sd_lobase = 0, sd_type = 0, sd_dpl = 0, sd_p = 0, sd_hilimit = 0, sd_xx0 = 0, sd_gran = 0, 
>       sd_hibase = 0, sd_xx1 = 0, sd_mbz = 0, sd_xx2 = 0}}, p_itcallout = {c_links = {sle = {sle_next = 0x0}, tqe = {tqe_next = 0x0, tqe_prev = 0x0}}, c_time = 0, 
>     c_arg = 0x0, c_func = 0, c_lock = 0x0, c_flags = 16, c_cpu = 0}, p_acflag = 0, p_peers = 0x0, p_leader = 0xfffffe0012e48900, p_emuldata = 0x0, p_label = 0x0, 
>   p_sched = 0xfffffe0012e48d80, p_ktr = {stqh_first = 0x0, stqh_last = 0xfffffe0012e48d30}, p_mqnotifier = {lh_first = 0x0}, p_dtrace = 0xfffffe0005d85b80, p_pwait = {
>     cv_description = 0xffffffff80954fbe "ppwait", cv_waiters = 0}, p_dbgwait = {cv_description = 0xffffffff80954fc5 "dbgwait", cv_waiters = 0}, p_prev_runtime = 0, 
>   p_racct = 0x0}
> 
> I'm not sure if switching to the console first matters.
> 
> Before the panic, the states of gdb and claws-mail were:
> 
> fk_at_r500 ~ $procstat -kk $(pgrep claws-mail)
>   PID    TID COMM             TDNAME           KSTACK                       
> 15813 100528 claws-mail       initial thread   mi_switch+0x21d thread_suspend_switch+0x103 ptracestop+0x7a cursig+0x4af ast+0x198 doreti_ast+0x1f 
> fk_at_r500 ~ $procstat -kk $(pgrep gdb)
>   PID    TID COMM             TDNAME           KSTACK                       
> 15823 100928 gdb              -                mi_switch+0x21d sleepq_switch+0x123 sleepq_catch_signals+0x2a2 sleepq_wait_sig+0x16 _cv_wait_sig+0x14c tty_wait+0x48 ttydisc_read+0xd4 ttydev_read+0xab devfs_read_f+0x88 dofileread+0xa1 kern_readv+0x60 read+0x55 syscallenter+0x1cb syscall+0x4c Xfast_syscall+0xdd 
> 
> I believe I also reproduced the problem with a kernel from Tuesday,
> but dumping core from ddb failed so I can't check.
> 
> I cannot reproduce the panic with a kernel from 2011-03-12
> which I haven't deleted yet, but didn't try to bisect it further.
> 
> gdb can't attach to claws-mail either, but the panic doesn't occur:
> 
> fk_at_r500 ~ $gdb -p $(pgrep claws-mail)
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "amd64-marcel-freebsd".
> Attaching to process 3432
> /usr/src/gnu/usr.bin/gdb/libgdb/../../../../contrib/gdb/gdb/solib-svr4.c:1444: i
> nternal-error: legacy_fetch_link_map_offsets called without legacy link_map supp
> ort enabled.
> A problem internal to GDB has been detected,
> further debugging may prove unreliable.
> Quit this debugging session? (y or n) y
> 
> /usr/src/gnu/usr.bin/gdb/libgdb/../../../../contrib/gdb/gdb/solib-svr4.c:1444: i
> nternal-error: legacy_fetch_link_map_offsets called without legacy link_map supp
> ort enabled.
> A problem internal to GDB has been detected,
> further debugging may prove unreliable.
> Create a core file of GDB? (y or n) n
> 
> In related news, the kernel from yesterday also seems to panic/hang/whatever
> when claws-mail segfaults due to an invalid memory access, but as I was running
> X, I got no core dumps for those problems either.

For the first problem, try patch at the end.

Regarding the 'panic/hang/whatever', I cannot reproduce it with a program
that specifically access unmapped memory. You would need to catch core
dump and provide the backtrace.

diff --git a/sys/kern/kern_exit.c b/sys/kern/kern_exit.c
index 5fec31b..2d35ed5 100644
--- a/sys/kern/kern_exit.c
+++ b/sys/kern/kern_exit.c
_at__at_ -903,15 +903,19 _at__at_ loop:
 void
 proc_reparent(struct proc *child, struct proc *parent)
 {
+	int locked;
 
 	sx_assert(&proctree_lock, SX_XLOCKED);
 	PROC_LOCK_ASSERT(child, MA_OWNED);
 	if (child->p_pptr == parent)
 		return;
 
-	PROC_LOCK(parent);
+	locked = PROC_LOCKED(parent);
+	if (!locked)
+		PROC_LOCK(parent);
 	racct_add_force(parent, RACCT_NPROC, 1);
-	PROC_UNLOCK(parent);
+	if (!locked)
+		PROC_UNLOCK(parent);
 	PROC_LOCK(child->p_pptr);
 	racct_sub(child->p_pptr, RACCT_NPROC, 1);
 	sigqueue_take(child->p_ksi);

Received on Sun Apr 10 2011 - 13:18:54 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:13 UTC