On Sun, Apr 10, 2011 at 03:37:59PM +0200, Fabian Keil wrote: > The following panic seems to be reliably reproducible with sources > from yesterday (and today) by running claws-mail in X, switching to > the console and trying to attach gdb to it with 'gdb -p $(pgrep claws-mail)'. > > gdb somehow fails to attach, and after leaving gdb the panic occurs. > > fk_at_r500 /usr/crash $cat info.0 > Dump header from device /dev/ada0s1b > Architecture: amd64 > Architecture Version: 2 > Dump Length: 347676672B (331 MB) > Blocksize: 512 > Dumptime: Sat Apr 9 22:48:15 2011 > Hostname: r500.local > Magic: FreeBSD Kernel Dump > Version String: FreeBSD 9.0-CURRENT #284 r+1c500d9: Sat Apr 9 15:44:51 CEST 2011 > fk_at_r500.local:/usr/obj/usr/src/sys/ZOEY > Panic String: _mtx_lock_sleep: recursed on non-recursive mutex process lock _at_ /usr/src/sys/kern/kern_exit.c:912 > > Dump Parity: 1557143341 > Bounds: 0 > Dump Status: good > > fk_at_r500 /usr/crash $kgdb /boot/kernel/kernel vmcore.0 > GNU gdb 6.1.1 [FreeBSD] > Copyright 2004 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and you are > welcome to change it and/or distribute copies of it under certain conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for details. > This GDB was configured as "amd64-marcel-freebsd"... > > Unread portion of the kernel message buffer: > panic: _mtx_lock_sleep: recursed on non-recursive mutex process lock _at_ /usr/src/sys/kern/kern_exit.c:912 > > cpuid = 0 > KDB: enter: panic > panic: from debugger > cpuid = 0 > Uptime: 3m48s > Physical memory: 1974 MB > Dumping 331 MB: 316 300 284 268 252 236 220 204 188 172 156 140 124 108 92 76 60 44 28 12 > > Reading symbols from /boot/kernel/zfs.ko...Reading symbols from /boot/kernel/zfs.ko.symbols...done. > done. > [...] > Loaded symbols for /boot/kernel/drm.ko > #0 doadump () at /usr/src/sys/kern/kern_shutdown.c:250 > 250 if (textdump_pending) > (kgdb) where > #0 doadump () at /usr/src/sys/kern/kern_shutdown.c:250 > #1 0xffffffff80531fb7 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:418 > #2 0xffffffff80531a51 in panic (fmt=Variable "fmt" is not available. > ) at /usr/src/sys/kern/kern_shutdown.c:591 > #3 0xffffffff803202b7 in db_panic (addr=Variable "addr" is not available. > ) at /usr/src/sys/ddb/db_command.c:478 > #4 0xffffffff80320761 in db_command (last_cmdp=0xffffffff80c9fb60, cmd_table=Variable "cmd_table" is not available. > ) at /usr/src/sys/ddb/db_command.c:445 > #5 0xffffffff803209b0 in db_command_loop () at /usr/src/sys/ddb/db_command.c:498 > #6 0xffffffff80322a99 in db_trap (type=Variable "type" is not available. > ) at /usr/src/sys/ddb/db_main.c:229 > #7 0xffffffff80567ba1 in kdb_trap (type=3, code=0, tf=0xffffff8000017740) at /usr/src/sys/kern/subr_kdb.c:533 > #8 0xffffffff807cee06 in trap (frame=0xffffff8000017740) at /usr/src/sys/amd64/amd64/trap.c:590 > #9 0xffffffff807b9303 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:228 > #10 0xffffffff805679fb in kdb_enter (why=0xffffffff80955faf "panic", msg=0xa <Address 0xa out of bounds>) at cpufunc.h:63 > #11 0xffffffff80531a60 in panic (fmt=Variable "fmt" is not available. > ) at /usr/src/sys/kern/kern_shutdown.c:574 > #12 0xffffffff805222f9 in _mtx_lock_sleep (m=Variable "m" is not available. > ) at /usr/src/sys/kern/kern_mutex.c:341 > #13 0xffffffff805223a5 in _mtx_lock_flags (m=Variable "m" is not available. > ) at /usr/src/sys/kern/kern_mutex.c:203 > #14 0xffffffff80503789 in proc_reparent (child=0xfffffe001290a000, parent=0xfffffe0012e48900) at /usr/src/sys/kern/kern_exit.c:912 > #15 0xffffffff80503ed8 in kern_wait (td=0xffffff8000017adc, pid=Variable "pid" is not available. > ) at /usr/src/sys/kern/kern_exit.c:708 > #16 0xffffffff805043e5 in wait4 (td=Variable "td" is not available. > ) at /usr/src/sys/kern/kern_exit.c:653 > #17 0xffffffff805751ab in syscallenter (td=0xfffffe00028838c0, sa=0xffffff8000017bb0) at /usr/src/sys/kern/subr_trap.c:344 > #18 0xffffffff807ce8bc in syscall (frame=0xffffff8000017c50) at /usr/src/sys/amd64/amd64/trap.c:910 > #19 0xffffffff807b95cd in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:384 > #20 0x000000000040cb8c in ?? () > Previous frame inner to this frame (corrupt stack?) > (kgdb) f 14 > #14 0xffffffff80503789 in proc_reparent (child=0xfffffe001290a000, parent=0xfffffe0012e48900) at /usr/src/sys/kern/kern_exit.c:912 > 912 PROC_LOCK(parent); > (kgdb) p *child > $2 = {p_list = {le_next = 0x0, le_prev = 0xffffffff80cba328}, p_threads = {tqh_first = 0xfffffe001287c000, tqh_last = 0xfffffe001287c010}, p_slock = {lock_object = { > lo_name = 0xffffffff80954527 "process slock", lo_flags = 720896, lo_data = 0, lo_witness = 0x0}, mtx_lock = 4}, p_ucred = 0xfffffe00427b5a00, p_fd = 0x0, > p_fdtol = 0x0, p_stats = 0xfffffe0005d82800, p_limit = 0x0, p_limco = {c_links = {sle = {sle_next = 0x0}, tqe = {tqe_next = 0x0, tqe_prev = 0x0}}, c_time = 0, c_arg = 0x0, > c_func = 0, c_lock = 0xfffffe001290a0f8, c_flags = 0, c_cpu = 0}, p_sigacts = 0xfffffe0042634000, p_flag = 268460034, p_state = PRS_ZOMBIE, p_pid = 3515, p_hash = { > le_next = 0x0, le_prev = 0xffffff8000222dd8}, p_pglist = {le_next = 0x0, le_prev = 0xfffffe004a876790}, p_pptr = 0xfffffe0002881900, p_sibling = { > le_next = 0xfffffe0042640900, le_prev = 0xfffffe00028819f0}, p_children = {lh_first = 0x0}, p_mtx = {lock_object = {lo_name = 0xffffffff8095451a "process lock", > lo_flags = 21168128, lo_data = 0, lo_witness = 0x0}, mtx_lock = 18446741874728777920}, p_ksi = 0xfffffe0005c60150, p_sigqueue = {sq_signals = {__bits = {0, 0, 0, 0}}, > sq_kill = {__bits = {0, 0, 0, 0}}, sq_list = {tqh_first = 0x0, tqh_last = 0xfffffe001290a140}, sq_proc = 0xfffffe001290a000, sq_flags = 1}, p_oppid = 0, > p_vmspace = 0xffffffff80cb7760, p_swtick = 21105, p_realtimer = {it_interval = {tv_sec = 0, tv_usec = 0}, it_value = {tv_sec = 0, tv_usec = 0}}, p_ru = {ru_utime = { > tv_sec = 0, tv_usec = 0}, ru_stime = {tv_sec = 0, tv_usec = 0}, ru_maxrss = 21044, ru_ixrss = 544640, ru_idrss = 334244, ru_isrss = 27264, ru_minflt = 2745, > ru_majflt = 2158, ru_nswap = 0, ru_inblock = 3, ru_oublock = 0, ru_msgsnd = 134, ru_msgrcv = 103, ru_nsignals = 0, ru_nvcsw = 473, ru_nivcsw = 533}, p_rux = { > rux_runtime = 2346291660, rux_uticks = 119, rux_sticks = 29, rux_iticks = 0, rux_uu = 948699, rux_su = 223223, rux_tu = 1171923}, p_crux = {rux_runtime = 0, > rux_uticks = 0, rux_sticks = 0, rux_iticks = 0, rux_uu = 0, rux_su = 0, rux_tu = 0}, p_profthreads = 0, p_exitthreads = 0, p_traceflag = 0, p_tracevp = 0x0, > p_tracecred = 0x0, p_textvp = 0x0, p_lock = 0, p_sigiolst = {slh_first = 0x0}, p_sigparent = 20, p_sig = 0, p_code = 0, p_stops = 0, p_stype = 0, p_step = 0 '\0', > p_pfsflags = 0 '\0', p_nlminfo = 0x0, p_aioinfo = 0x0, p_singlethread = 0x0, p_suspcount = 0, p_xthread = 0xfffffe001287c000, p_boundary_count = 0, p_pendingcnt = 0, > p_itimers = 0x0, p_magic = 3203398350, p_osrel = 900034, p_comm = "claws-mail\000\000\000\000\000\000\000\000\000", p_pgrp = 0xfffffe004a876780, > p_sysent = 0xffffffff80c6b320, p_args = 0xfffffe0005c8b3c0, p_cpulimit = 9223372036854775807, p_nice = 0 '\0', p_fibnum = 0, p_xstat = 9, p_klist = {kl_list = { > slh_first = 0x0}, kl_lock = 0, kl_unlock = 0, kl_assert_locked = 0xffffffff804fd960 <knlist_mtx_assert_locked>, > kl_assert_unlocked = 0xffffffff804fd940 <knlist_mtx_assert_unlocked>, kl_lockarg = 0x0}, p_numthreads = 1, p_md = {md_ldt = 0x0, md_ldt_sd = {sd_lolimit = 0, > sd_lobase = 0, sd_type = 0, sd_dpl = 0, sd_p = 0, sd_hilimit = 0, sd_xx0 = 0, sd_gran = 0, sd_hibase = 0, sd_xx1 = 0, sd_mbz = 0, sd_xx2 = 0}}, p_itcallout = { > c_links = {sle = {sle_next = 0x0}, tqe = {tqe_next = 0x0, tqe_prev = 0x0}}, c_time = 0, c_arg = 0x0, c_func = 0, c_lock = 0x0, c_flags = 16, c_cpu = 0}, p_acflag = 16, > p_peers = 0x0, p_leader = 0xfffffe001290a000, p_emuldata = 0x0, p_label = 0x0, p_sched = 0xfffffe001290a480, p_ktr = {stqh_first = 0x0, stqh_last = 0xfffffe001290a430}, > p_mqnotifier = {lh_first = 0x0}, p_dtrace = 0xfffffe0005c7a740, p_pwait = {cv_description = 0xffffffff80954fbe "ppwait", cv_waiters = 0}, p_dbgwait = { > cv_description = 0xffffffff80954fc5 "dbgwait", cv_waiters = 0}, p_prev_runtime = 0, p_racct = 0x0} > (kgdb) p *parent > $3 = {p_list = {le_next = 0xfffffe004261b000, le_prev = 0xffffffff80cba320}, p_threads = {tqh_first = 0xfffffe0005e0d8c0, tqh_last = 0xfffffe0005e0d8d0}, p_slock = { > lock_object = {lo_name = 0xffffffff80954527 "process slock", lo_flags = 720896, lo_data = 0, lo_witness = 0x0}, mtx_lock = 4}, p_ucred = 0xfffffe00427b5a00, > p_fd = 0xfffffe00128b1c00, p_fdtol = 0x0, p_stats = 0xfffffe001290c800, p_limit = 0xfffffe00427b5300, p_limco = {c_links = {sle = {sle_next = 0x0}, tqe = {tqe_next = 0x0, > tqe_prev = 0x0}}, c_time = 0, c_arg = 0x0, c_func = 0, c_lock = 0xfffffe0012e489f8, c_flags = 0, c_cpu = 0}, p_sigacts = 0xfffffe0012e9c000, p_flag = 268451842, > p_state = PRS_NORMAL, p_pid = 3407, p_hash = {le_next = 0x0, le_prev = 0xffffff8000222a78}, p_pglist = {le_next = 0x0, le_prev = 0xfffffe0005e59310}, > p_pptr = 0xfffffe004261b000, p_sibling = {le_next = 0x0, le_prev = 0xfffffe004261b0f0}, p_children = {lh_first = 0x0}, p_mtx = {lock_object = { > lo_name = 0xffffffff8095451a "process lock", lo_flags = 21168128, lo_data = 0, lo_witness = 0x0}, mtx_lock = 18446741874728777920}, p_ksi = 0xfffffe0005c5eb60, > p_sigqueue = {sq_signals = {__bits = {0, 0, 0, 0}}, sq_kill = {__bits = {0, 0, 0, 0}}, sq_list = {tqh_first = 0x0, tqh_last = 0xfffffe0012e48a40}, > sq_proc = 0xfffffe0012e48900, sq_flags = 1}, p_oppid = 0, p_vmspace = 0xfffffe00128c5310, p_swtick = 12384, p_realtimer = {it_interval = {tv_sec = 0, tv_usec = 0}, > it_value = {tv_sec = 0, tv_usec = 0}}, p_ru = {ru_utime = {tv_sec = 0, tv_usec = 0}, ru_stime = {tv_sec = 0, tv_usec = 0}, ru_maxrss = 0, ru_ixrss = 0, ru_idrss = 0, > ru_isrss = 0, ru_minflt = 0, ru_majflt = 0, ru_nswap = 0, ru_inblock = 0, ru_oublock = 0, ru_msgsnd = 0, ru_msgrcv = 0, ru_nsignals = 0, ru_nvcsw = 0, ru_nivcsw = 0}, > p_rux = {rux_runtime = 1633285860, rux_uticks = 63, rux_sticks = 44, rux_iticks = 0, rux_uu = 482021, rux_su = 336649, rux_tu = 818671}, p_crux = {rux_runtime = 770335510, > rux_uticks = 11, rux_sticks = 39, rux_iticks = 0, rux_uu = 84947, rux_su = 301176, rux_tu = 386124}, p_profthreads = 0, p_exitthreads = 0, p_traceflag = 0, > p_tracevp = 0x0, p_tracecred = 0x0, p_textvp = 0xfffffe00427ae780, p_lock = 0, p_sigiolst = {slh_first = 0x0}, p_sigparent = 20, p_sig = 0, p_code = 0, p_stops = 0, > p_stype = 0, p_step = 0 '\0', p_pfsflags = 0 '\0', p_nlminfo = 0x0, p_aioinfo = 0x0, p_singlethread = 0x0, p_suspcount = 0, p_xthread = 0x0, p_boundary_count = 0, > p_pendingcnt = 0, p_itimers = 0x0, p_magic = 3203398350, p_osrel = 900025, p_comm = "bash", '\0' <repeats 15 times>, p_pgrp = 0xfffffe0005e59300, > p_sysent = 0xffffffff80c6b320, p_args = 0xfffffe0002ba2880, p_cpulimit = 9223372036854775807, p_nice = 0 '\0', p_fibnum = 0, p_xstat = 0, p_klist = {kl_list = { > slh_first = 0x0}, kl_lock = 0xffffffff804fd660 <knlist_mtx_lock>, kl_unlock = 0xffffffff804fd680 <knlist_mtx_unlock>, > kl_assert_locked = 0xffffffff804fd960 <knlist_mtx_assert_locked>, kl_assert_unlocked = 0xffffffff804fd940 <knlist_mtx_assert_unlocked>, kl_lockarg = 0xfffffe0012e489f8}, > p_numthreads = 1, p_md = {md_ldt = 0x0, md_ldt_sd = {sd_lolimit = 0, sd_lobase = 0, sd_type = 0, sd_dpl = 0, sd_p = 0, sd_hilimit = 0, sd_xx0 = 0, sd_gran = 0, > sd_hibase = 0, sd_xx1 = 0, sd_mbz = 0, sd_xx2 = 0}}, p_itcallout = {c_links = {sle = {sle_next = 0x0}, tqe = {tqe_next = 0x0, tqe_prev = 0x0}}, c_time = 0, > c_arg = 0x0, c_func = 0, c_lock = 0x0, c_flags = 16, c_cpu = 0}, p_acflag = 0, p_peers = 0x0, p_leader = 0xfffffe0012e48900, p_emuldata = 0x0, p_label = 0x0, > p_sched = 0xfffffe0012e48d80, p_ktr = {stqh_first = 0x0, stqh_last = 0xfffffe0012e48d30}, p_mqnotifier = {lh_first = 0x0}, p_dtrace = 0xfffffe0005d85b80, p_pwait = { > cv_description = 0xffffffff80954fbe "ppwait", cv_waiters = 0}, p_dbgwait = {cv_description = 0xffffffff80954fc5 "dbgwait", cv_waiters = 0}, p_prev_runtime = 0, > p_racct = 0x0} > > I'm not sure if switching to the console first matters. > > Before the panic, the states of gdb and claws-mail were: > > fk_at_r500 ~ $procstat -kk $(pgrep claws-mail) > PID TID COMM TDNAME KSTACK > 15813 100528 claws-mail initial thread mi_switch+0x21d thread_suspend_switch+0x103 ptracestop+0x7a cursig+0x4af ast+0x198 doreti_ast+0x1f > fk_at_r500 ~ $procstat -kk $(pgrep gdb) > PID TID COMM TDNAME KSTACK > 15823 100928 gdb - mi_switch+0x21d sleepq_switch+0x123 sleepq_catch_signals+0x2a2 sleepq_wait_sig+0x16 _cv_wait_sig+0x14c tty_wait+0x48 ttydisc_read+0xd4 ttydev_read+0xab devfs_read_f+0x88 dofileread+0xa1 kern_readv+0x60 read+0x55 syscallenter+0x1cb syscall+0x4c Xfast_syscall+0xdd > > I believe I also reproduced the problem with a kernel from Tuesday, > but dumping core from ddb failed so I can't check. > > I cannot reproduce the panic with a kernel from 2011-03-12 > which I haven't deleted yet, but didn't try to bisect it further. > > gdb can't attach to claws-mail either, but the panic doesn't occur: > > fk_at_r500 ~ $gdb -p $(pgrep claws-mail) > GNU gdb 6.1.1 [FreeBSD] > Copyright 2004 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and you are > welcome to change it and/or distribute copies of it under certain conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for details. > This GDB was configured as "amd64-marcel-freebsd". > Attaching to process 3432 > /usr/src/gnu/usr.bin/gdb/libgdb/../../../../contrib/gdb/gdb/solib-svr4.c:1444: i > nternal-error: legacy_fetch_link_map_offsets called without legacy link_map supp > ort enabled. > A problem internal to GDB has been detected, > further debugging may prove unreliable. > Quit this debugging session? (y or n) y > > /usr/src/gnu/usr.bin/gdb/libgdb/../../../../contrib/gdb/gdb/solib-svr4.c:1444: i > nternal-error: legacy_fetch_link_map_offsets called without legacy link_map supp > ort enabled. > A problem internal to GDB has been detected, > further debugging may prove unreliable. > Create a core file of GDB? (y or n) n > > In related news, the kernel from yesterday also seems to panic/hang/whatever > when claws-mail segfaults due to an invalid memory access, but as I was running > X, I got no core dumps for those problems either. For the first problem, try patch at the end. Regarding the 'panic/hang/whatever', I cannot reproduce it with a program that specifically access unmapped memory. You would need to catch core dump and provide the backtrace. diff --git a/sys/kern/kern_exit.c b/sys/kern/kern_exit.c index 5fec31b..2d35ed5 100644 --- a/sys/kern/kern_exit.c +++ b/sys/kern/kern_exit.c _at__at_ -903,15 +903,19 _at__at_ loop: void proc_reparent(struct proc *child, struct proc *parent) { + int locked; sx_assert(&proctree_lock, SX_XLOCKED); PROC_LOCK_ASSERT(child, MA_OWNED); if (child->p_pptr == parent) return; - PROC_LOCK(parent); + locked = PROC_LOCKED(parent); + if (!locked) + PROC_LOCK(parent); racct_add_force(parent, RACCT_NPROC, 1); - PROC_UNLOCK(parent); + if (!locked) + PROC_UNLOCK(parent); PROC_LOCK(child->p_pptr); racct_sub(child->p_pptr, RACCT_NPROC, 1); sigqueue_take(child->p_ksi);
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:13 UTC