panic: _mtx_lock_sleep: recursed on non-recursive mutex process lock @ /usr/src/sys/kern/kern_exit.c:912

From: Fabian Keil <freebsd-listen_at_fabiankeil.de>
Date: Sun, 10 Apr 2011 15:37:59 +0200
The following panic seems to be reliably reproducible with sources
from yesterday (and today) by running claws-mail in X, switching to
the console and trying to attach gdb to it with 'gdb -p $(pgrep claws-mail)'.

gdb somehow fails to attach, and after leaving gdb the panic occurs.

fk_at_r500 /usr/crash $cat info.0 
Dump header from device /dev/ada0s1b
  Architecture: amd64           
  Architecture Version: 2       
  Dump Length: 347676672B (331 MB)
  Blocksize: 512                
  Dumptime: Sat Apr  9 22:48:15 2011
  Hostname: r500.local          
  Magic: FreeBSD Kernel Dump    
  Version String: FreeBSD 9.0-CURRENT #284 r+1c500d9: Sat Apr  9 15:44:51 CEST 2011
    fk_at_r500.local:/usr/obj/usr/src/sys/ZOEY
  Panic String: _mtx_lock_sleep: recursed on non-recursive mutex process lock _at_ /usr/src/sys/kern/kern_exit.c:912

  Dump Parity: 1557143341
  Bounds: 0
  Dump Status: good

fk_at_r500 /usr/crash $kgdb /boot/kernel/kernel vmcore.0 
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:
panic: _mtx_lock_sleep: recursed on non-recursive mutex process lock _at_ /usr/src/sys/kern/kern_exit.c:912

cpuid = 0
KDB: enter: panic
panic: from debugger
cpuid = 0
Uptime: 3m48s
Physical memory: 1974 MB
Dumping 331 MB: 316 300 284 268 252 236 220 204 188 172 156 140 124 108 92 76 60 44 28 12

Reading symbols from /boot/kernel/zfs.ko...Reading symbols from /boot/kernel/zfs.ko.symbols...done.
done.
[...]
Loaded symbols for /boot/kernel/drm.ko
#0  doadump () at /usr/src/sys/kern/kern_shutdown.c:250
250             if (textdump_pending)
(kgdb) where
#0  doadump () at /usr/src/sys/kern/kern_shutdown.c:250
#1  0xffffffff80531fb7 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:418
#2  0xffffffff80531a51 in panic (fmt=Variable "fmt" is not available.
) at /usr/src/sys/kern/kern_shutdown.c:591
#3  0xffffffff803202b7 in db_panic (addr=Variable "addr" is not available.
) at /usr/src/sys/ddb/db_command.c:478
#4  0xffffffff80320761 in db_command (last_cmdp=0xffffffff80c9fb60, cmd_table=Variable "cmd_table" is not available.
) at /usr/src/sys/ddb/db_command.c:445
#5  0xffffffff803209b0 in db_command_loop () at /usr/src/sys/ddb/db_command.c:498
#6  0xffffffff80322a99 in db_trap (type=Variable "type" is not available.
) at /usr/src/sys/ddb/db_main.c:229
#7  0xffffffff80567ba1 in kdb_trap (type=3, code=0, tf=0xffffff8000017740) at /usr/src/sys/kern/subr_kdb.c:533
#8  0xffffffff807cee06 in trap (frame=0xffffff8000017740) at /usr/src/sys/amd64/amd64/trap.c:590
#9  0xffffffff807b9303 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:228
#10 0xffffffff805679fb in kdb_enter (why=0xffffffff80955faf "panic", msg=0xa <Address 0xa out of bounds>) at cpufunc.h:63
#11 0xffffffff80531a60 in panic (fmt=Variable "fmt" is not available.
) at /usr/src/sys/kern/kern_shutdown.c:574
#12 0xffffffff805222f9 in _mtx_lock_sleep (m=Variable "m" is not available.
) at /usr/src/sys/kern/kern_mutex.c:341
#13 0xffffffff805223a5 in _mtx_lock_flags (m=Variable "m" is not available.
) at /usr/src/sys/kern/kern_mutex.c:203
#14 0xffffffff80503789 in proc_reparent (child=0xfffffe001290a000, parent=0xfffffe0012e48900) at /usr/src/sys/kern/kern_exit.c:912
#15 0xffffffff80503ed8 in kern_wait (td=0xffffff8000017adc, pid=Variable "pid" is not available.
) at /usr/src/sys/kern/kern_exit.c:708
#16 0xffffffff805043e5 in wait4 (td=Variable "td" is not available.
) at /usr/src/sys/kern/kern_exit.c:653
#17 0xffffffff805751ab in syscallenter (td=0xfffffe00028838c0, sa=0xffffff8000017bb0) at /usr/src/sys/kern/subr_trap.c:344
#18 0xffffffff807ce8bc in syscall (frame=0xffffff8000017c50) at /usr/src/sys/amd64/amd64/trap.c:910
#19 0xffffffff807b95cd in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:384
#20 0x000000000040cb8c in ?? ()
Previous frame inner to this frame (corrupt stack?)
(kgdb) f 14
#14 0xffffffff80503789 in proc_reparent (child=0xfffffe001290a000, parent=0xfffffe0012e48900) at /usr/src/sys/kern/kern_exit.c:912
912             PROC_LOCK(parent);
(kgdb) p *child
$2 = {p_list = {le_next = 0x0, le_prev = 0xffffffff80cba328}, p_threads = {tqh_first = 0xfffffe001287c000, tqh_last = 0xfffffe001287c010}, p_slock = {lock_object = {
      lo_name = 0xffffffff80954527 "process slock", lo_flags = 720896, lo_data = 0, lo_witness = 0x0}, mtx_lock = 4}, p_ucred = 0xfffffe00427b5a00, p_fd = 0x0, 
  p_fdtol = 0x0, p_stats = 0xfffffe0005d82800, p_limit = 0x0, p_limco = {c_links = {sle = {sle_next = 0x0}, tqe = {tqe_next = 0x0, tqe_prev = 0x0}}, c_time = 0, c_arg = 0x0, 
    c_func = 0, c_lock = 0xfffffe001290a0f8, c_flags = 0, c_cpu = 0}, p_sigacts = 0xfffffe0042634000, p_flag = 268460034, p_state = PRS_ZOMBIE, p_pid = 3515, p_hash = {
    le_next = 0x0, le_prev = 0xffffff8000222dd8}, p_pglist = {le_next = 0x0, le_prev = 0xfffffe004a876790}, p_pptr = 0xfffffe0002881900, p_sibling = {
    le_next = 0xfffffe0042640900, le_prev = 0xfffffe00028819f0}, p_children = {lh_first = 0x0}, p_mtx = {lock_object = {lo_name = 0xffffffff8095451a "process lock", 
      lo_flags = 21168128, lo_data = 0, lo_witness = 0x0}, mtx_lock = 18446741874728777920}, p_ksi = 0xfffffe0005c60150, p_sigqueue = {sq_signals = {__bits = {0, 0, 0, 0}}, 
    sq_kill = {__bits = {0, 0, 0, 0}}, sq_list = {tqh_first = 0x0, tqh_last = 0xfffffe001290a140}, sq_proc = 0xfffffe001290a000, sq_flags = 1}, p_oppid = 0, 
  p_vmspace = 0xffffffff80cb7760, p_swtick = 21105, p_realtimer = {it_interval = {tv_sec = 0, tv_usec = 0}, it_value = {tv_sec = 0, tv_usec = 0}}, p_ru = {ru_utime = {
      tv_sec = 0, tv_usec = 0}, ru_stime = {tv_sec = 0, tv_usec = 0}, ru_maxrss = 21044, ru_ixrss = 544640, ru_idrss = 334244, ru_isrss = 27264, ru_minflt = 2745, 
    ru_majflt = 2158, ru_nswap = 0, ru_inblock = 3, ru_oublock = 0, ru_msgsnd = 134, ru_msgrcv = 103, ru_nsignals = 0, ru_nvcsw = 473, ru_nivcsw = 533}, p_rux = {
    rux_runtime = 2346291660, rux_uticks = 119, rux_sticks = 29, rux_iticks = 0, rux_uu = 948699, rux_su = 223223, rux_tu = 1171923}, p_crux = {rux_runtime = 0, 
    rux_uticks = 0, rux_sticks = 0, rux_iticks = 0, rux_uu = 0, rux_su = 0, rux_tu = 0}, p_profthreads = 0, p_exitthreads = 0, p_traceflag = 0, p_tracevp = 0x0, 
  p_tracecred = 0x0, p_textvp = 0x0, p_lock = 0, p_sigiolst = {slh_first = 0x0}, p_sigparent = 20, p_sig = 0, p_code = 0, p_stops = 0, p_stype = 0, p_step = 0 '\0', 
  p_pfsflags = 0 '\0', p_nlminfo = 0x0, p_aioinfo = 0x0, p_singlethread = 0x0, p_suspcount = 0, p_xthread = 0xfffffe001287c000, p_boundary_count = 0, p_pendingcnt = 0, 
  p_itimers = 0x0, p_magic = 3203398350, p_osrel = 900034, p_comm = "claws-mail\000\000\000\000\000\000\000\000\000", p_pgrp = 0xfffffe004a876780, 
  p_sysent = 0xffffffff80c6b320, p_args = 0xfffffe0005c8b3c0, p_cpulimit = 9223372036854775807, p_nice = 0 '\0', p_fibnum = 0, p_xstat = 9, p_klist = {kl_list = {
      slh_first = 0x0}, kl_lock = 0, kl_unlock = 0, kl_assert_locked = 0xffffffff804fd960 <knlist_mtx_assert_locked>, 
    kl_assert_unlocked = 0xffffffff804fd940 <knlist_mtx_assert_unlocked>, kl_lockarg = 0x0}, p_numthreads = 1, p_md = {md_ldt = 0x0, md_ldt_sd = {sd_lolimit = 0, 
      sd_lobase = 0, sd_type = 0, sd_dpl = 0, sd_p = 0, sd_hilimit = 0, sd_xx0 = 0, sd_gran = 0, sd_hibase = 0, sd_xx1 = 0, sd_mbz = 0, sd_xx2 = 0}}, p_itcallout = {
    c_links = {sle = {sle_next = 0x0}, tqe = {tqe_next = 0x0, tqe_prev = 0x0}}, c_time = 0, c_arg = 0x0, c_func = 0, c_lock = 0x0, c_flags = 16, c_cpu = 0}, p_acflag = 16, 
  p_peers = 0x0, p_leader = 0xfffffe001290a000, p_emuldata = 0x0, p_label = 0x0, p_sched = 0xfffffe001290a480, p_ktr = {stqh_first = 0x0, stqh_last = 0xfffffe001290a430}, 
  p_mqnotifier = {lh_first = 0x0}, p_dtrace = 0xfffffe0005c7a740, p_pwait = {cv_description = 0xffffffff80954fbe "ppwait", cv_waiters = 0}, p_dbgwait = {
    cv_description = 0xffffffff80954fc5 "dbgwait", cv_waiters = 0}, p_prev_runtime = 0, p_racct = 0x0}
(kgdb) p *parent
$3 = {p_list = {le_next = 0xfffffe004261b000, le_prev = 0xffffffff80cba320}, p_threads = {tqh_first = 0xfffffe0005e0d8c0, tqh_last = 0xfffffe0005e0d8d0}, p_slock = {
    lock_object = {lo_name = 0xffffffff80954527 "process slock", lo_flags = 720896, lo_data = 0, lo_witness = 0x0}, mtx_lock = 4}, p_ucred = 0xfffffe00427b5a00, 
  p_fd = 0xfffffe00128b1c00, p_fdtol = 0x0, p_stats = 0xfffffe001290c800, p_limit = 0xfffffe00427b5300, p_limco = {c_links = {sle = {sle_next = 0x0}, tqe = {tqe_next = 0x0, 
        tqe_prev = 0x0}}, c_time = 0, c_arg = 0x0, c_func = 0, c_lock = 0xfffffe0012e489f8, c_flags = 0, c_cpu = 0}, p_sigacts = 0xfffffe0012e9c000, p_flag = 268451842, 
  p_state = PRS_NORMAL, p_pid = 3407, p_hash = {le_next = 0x0, le_prev = 0xffffff8000222a78}, p_pglist = {le_next = 0x0, le_prev = 0xfffffe0005e59310}, 
  p_pptr = 0xfffffe004261b000, p_sibling = {le_next = 0x0, le_prev = 0xfffffe004261b0f0}, p_children = {lh_first = 0x0}, p_mtx = {lock_object = {
      lo_name = 0xffffffff8095451a "process lock", lo_flags = 21168128, lo_data = 0, lo_witness = 0x0}, mtx_lock = 18446741874728777920}, p_ksi = 0xfffffe0005c5eb60, 
  p_sigqueue = {sq_signals = {__bits = {0, 0, 0, 0}}, sq_kill = {__bits = {0, 0, 0, 0}}, sq_list = {tqh_first = 0x0, tqh_last = 0xfffffe0012e48a40}, 
    sq_proc = 0xfffffe0012e48900, sq_flags = 1}, p_oppid = 0, p_vmspace = 0xfffffe00128c5310, p_swtick = 12384, p_realtimer = {it_interval = {tv_sec = 0, tv_usec = 0}, 
    it_value = {tv_sec = 0, tv_usec = 0}}, p_ru = {ru_utime = {tv_sec = 0, tv_usec = 0}, ru_stime = {tv_sec = 0, tv_usec = 0}, ru_maxrss = 0, ru_ixrss = 0, ru_idrss = 0, 
    ru_isrss = 0, ru_minflt = 0, ru_majflt = 0, ru_nswap = 0, ru_inblock = 0, ru_oublock = 0, ru_msgsnd = 0, ru_msgrcv = 0, ru_nsignals = 0, ru_nvcsw = 0, ru_nivcsw = 0}, 
  p_rux = {rux_runtime = 1633285860, rux_uticks = 63, rux_sticks = 44, rux_iticks = 0, rux_uu = 482021, rux_su = 336649, rux_tu = 818671}, p_crux = {rux_runtime = 770335510, 
    rux_uticks = 11, rux_sticks = 39, rux_iticks = 0, rux_uu = 84947, rux_su = 301176, rux_tu = 386124}, p_profthreads = 0, p_exitthreads = 0, p_traceflag = 0, 
  p_tracevp = 0x0, p_tracecred = 0x0, p_textvp = 0xfffffe00427ae780, p_lock = 0, p_sigiolst = {slh_first = 0x0}, p_sigparent = 20, p_sig = 0, p_code = 0, p_stops = 0, 
  p_stype = 0, p_step = 0 '\0', p_pfsflags = 0 '\0', p_nlminfo = 0x0, p_aioinfo = 0x0, p_singlethread = 0x0, p_suspcount = 0, p_xthread = 0x0, p_boundary_count = 0, 
  p_pendingcnt = 0, p_itimers = 0x0, p_magic = 3203398350, p_osrel = 900025, p_comm = "bash", '\0' <repeats 15 times>, p_pgrp = 0xfffffe0005e59300, 
  p_sysent = 0xffffffff80c6b320, p_args = 0xfffffe0002ba2880, p_cpulimit = 9223372036854775807, p_nice = 0 '\0', p_fibnum = 0, p_xstat = 0, p_klist = {kl_list = {
      slh_first = 0x0}, kl_lock = 0xffffffff804fd660 <knlist_mtx_lock>, kl_unlock = 0xffffffff804fd680 <knlist_mtx_unlock>, 
    kl_assert_locked = 0xffffffff804fd960 <knlist_mtx_assert_locked>, kl_assert_unlocked = 0xffffffff804fd940 <knlist_mtx_assert_unlocked>, kl_lockarg = 0xfffffe0012e489f8}, 
  p_numthreads = 1, p_md = {md_ldt = 0x0, md_ldt_sd = {sd_lolimit = 0, sd_lobase = 0, sd_type = 0, sd_dpl = 0, sd_p = 0, sd_hilimit = 0, sd_xx0 = 0, sd_gran = 0, 
      sd_hibase = 0, sd_xx1 = 0, sd_mbz = 0, sd_xx2 = 0}}, p_itcallout = {c_links = {sle = {sle_next = 0x0}, tqe = {tqe_next = 0x0, tqe_prev = 0x0}}, c_time = 0, 
    c_arg = 0x0, c_func = 0, c_lock = 0x0, c_flags = 16, c_cpu = 0}, p_acflag = 0, p_peers = 0x0, p_leader = 0xfffffe0012e48900, p_emuldata = 0x0, p_label = 0x0, 
  p_sched = 0xfffffe0012e48d80, p_ktr = {stqh_first = 0x0, stqh_last = 0xfffffe0012e48d30}, p_mqnotifier = {lh_first = 0x0}, p_dtrace = 0xfffffe0005d85b80, p_pwait = {
    cv_description = 0xffffffff80954fbe "ppwait", cv_waiters = 0}, p_dbgwait = {cv_description = 0xffffffff80954fc5 "dbgwait", cv_waiters = 0}, p_prev_runtime = 0, 
  p_racct = 0x0}

I'm not sure if switching to the console first matters.

Before the panic, the states of gdb and claws-mail were:

fk_at_r500 ~ $procstat -kk $(pgrep claws-mail)
  PID    TID COMM             TDNAME           KSTACK                       
15813 100528 claws-mail       initial thread   mi_switch+0x21d thread_suspend_switch+0x103 ptracestop+0x7a cursig+0x4af ast+0x198 doreti_ast+0x1f 
fk_at_r500 ~ $procstat -kk $(pgrep gdb)
  PID    TID COMM             TDNAME           KSTACK                       
15823 100928 gdb              -                mi_switch+0x21d sleepq_switch+0x123 sleepq_catch_signals+0x2a2 sleepq_wait_sig+0x16 _cv_wait_sig+0x14c tty_wait+0x48 ttydisc_read+0xd4 ttydev_read+0xab devfs_read_f+0x88 dofileread+0xa1 kern_readv+0x60 read+0x55 syscallenter+0x1cb syscall+0x4c Xfast_syscall+0xdd 

I believe I also reproduced the problem with a kernel from Tuesday,
but dumping core from ddb failed so I can't check.

I cannot reproduce the panic with a kernel from 2011-03-12
which I haven't deleted yet, but didn't try to bisect it further.

gdb can't attach to claws-mail either, but the panic doesn't occur:

fk_at_r500 ~ $gdb -p $(pgrep claws-mail)
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd".
Attaching to process 3432
/usr/src/gnu/usr.bin/gdb/libgdb/../../../../contrib/gdb/gdb/solib-svr4.c:1444: i
nternal-error: legacy_fetch_link_map_offsets called without legacy link_map supp
ort enabled.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n) y

/usr/src/gnu/usr.bin/gdb/libgdb/../../../../contrib/gdb/gdb/solib-svr4.c:1444: i
nternal-error: legacy_fetch_link_map_offsets called without legacy link_map supp
ort enabled.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Create a core file of GDB? (y or n) n

In related news, the kernel from yesterday also seems to panic/hang/whatever
when claws-mail segfaults due to an invalid memory access, but as I was running
X, I got no core dumps for those problems either.

Fabian

Received on Sun Apr 10 2011 - 11:53:18 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:13 UTC