Hi there, There seems to be some racy code in kern_event.c which is causing me to run into some crashes. I’ve attached the test program used to generate these crashes (build it and run the “go” script). They were produced in a VM with 4 cores on 11 Alpha 3 (and originally 10.3). The crashes I’ve seen come in a few varieties: 1. “userret: returning with the following locks held”. This one is the easiest to hit (assuming witness is enabled). userret: returning with the following locks held: exclusive sleep mutex process lock (process lock) r = 0 (0xfffff80006956120) locked _at_ /usr/src/sys/kern/kern_event.c:2125 panic: witness_warn cpuid = 2 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe000039d8e0 vpanic() at vpanic+0x182/frame 0xfffffe000039d960 kassert_panic() at kassert_panic+0x126/frame 0xfffffe000039d9d0 witness_warn() at witness_warn+0x3c6/frame 0xfffffe000039daa0 userret() at userret+0x9d/frame 0xfffffe000039dae0 amd64_syscall() at amd64_syscall+0x406/frame 0xfffffe000039dbf0 Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe000039dbf0 --- syscall (1, FreeBSD ELF64, sys_sys_exit), rip = 0x800b8a0ba, rsp = 0x7fffffffea98, rbp = 0x7fffffffeae0 --- KDB: enter: panic [ thread pid 64855 tid 100106 ] Stopped at kdb_enter+0x3b: movq $0,kdb_why db> show all locks Process 64855 (watch) thread 0xfffff800066c3000 (100106) exclusive sleep mutex process lock (process lock) r = 0 (0xfffff80006956120) locked _at_ /usr/src/sys/kern/kern_event.c:2125 Process 64855 (watch) thread 0xfffff8000696a500 (100244) exclusive sleep mutex pmap (pmap) r = 0 (0xfffff800068c3138) locked _at_ /usr/src/sys/amd64/amd64/pmap.c:4067 exclusive sx vm map (user) (vm map (user)) r = 0 (0xfffff800068f6080) locked _at_ /usr/src/sys/vm/vm_map.c:3315 exclusive sx vm map (user) (vm map (user)) r = 0 (0xfffff800068c3080) locked _at_ /usr/src/sys/vm/vm_map.c:3311 db> ps pid ppid pgrp uid state wmesg wchan cmd 64855 690 690 0 R+ (threaded) watch 100106 Run CPU 2 main 100244 Run CPU 1 procmaker 100245 Run CPU 3 reaper 2. “Sleeping thread owns a non-sleepable lock”. This one first drew my attention by showing up in a real world application at work. Sleeping thread (tid 100101, pid 76857) owns a non-sleepable lock KDB: stack backtrace of thread 100101: sched_switch() at sched_switch+0x2a5/frame 0xfffffe0000257690 mi_switch() at mi_switch+0xe1/frame 0xfffffe00002576d0 sleepq_catch_signals() at sleepq_catch_signals+0x16c/frame 0xfffffe0000257730 sleepq_timedwait_sig() at sleepq_timedwait_sig+0xf/frame 0xfffffe0000257760 _sleep() at _sleep+0x234/frame 0xfffffe00002577e0 kern_kevent_fp() at kern_kevent_fp+0x38a/frame 0xfffffe00002579d0 kern_kevent() at kern_kevent+0x9f/frame 0xfffffe0000257a30 sys_kevent() at sys_kevent+0x12a/frame 0xfffffe0000257ae0 amd64_syscall() at amd64_syscall+0x2d4/frame 0xfffffe0000257bf0 Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe0000257bf0 --- syscall (363, FreeBSD ELF64, sys_kevent), rip = 0x800b6afea, rsp = 0x7fffffffea88, rbp = 0x7fffffffead0 --- panic: sleeping thread cpuid = 3 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0000225590 kdb_backtrace() at kdb_backtrace+0x39/frame 0xfffffe0000225640 vpanic() at vpanic+0x126/frame 0xfffffe0000225680 panic() at panic+0x43/frame 0xfffffe00002256e0 propagate_priority() at propagate_priority+0x166/frame 0xfffffe0000225710 turnstile_wait() at turnstile_wait+0x282/frame 0xfffffe0000225750 __mtx_lock_sleep() at __mtx_lock_sleep+0x26b/frame 0xfffffe00002257d0 __mtx_lock_flags() at __mtx_lock_flags+0x5e/frame 0xfffffe00002257f0 proc_to_reap() at proc_to_reap+0x46/frame 0xfffffe0000225840 kern_wait6() at kern_wait6+0x202/frame 0xfffffe00002258f0 sys_wait4() at sys_wait4+0x72/frame 0xfffffe0000225ae0 amd64_syscall() at amd64_syscall+0x2d4/frame 0xfffffe0000225bf0 Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe0000225bf0 --- syscall (7, FreeBSD ELF64, sys_wait4), rip = 0x800b209ba, rsp = 0x7fffdfdfcf48, rbp = 0x7fffdfdfcf80 --- KDB: enter: panic [ thread pid 76857 tid 100225 ] Stopped at kdb_enter+0x3e: movq $0,kdb_why db> show allchains chain 1: thread 100225 (pid 76857, reaper) blocked on lock 0xfffff800413105f0 (sleep mutex) "process lock" thread 100101 (pid 76857, main) inhibited (3./4.) There are a few others that I hit less frequently (“page fault while in kernel mode”, "Kernel page fault with the following non-sleepable locks held”. I don’t have a backtrace handy for these. I believe they all have more or less the same cause. The crashes occur because we acquire a knlist lock via the KN_LIST_LOCK macro, but when we call KN_LIST_UNLOCK, the knote’s knlist reference (kn->kn_knlist) has been cleared by another thread. Thus we are unable to unlock the previously acquired lock and hold it until something causes us to crash (such as the witness code noticing that we’re returning to userland with the lock still held). A walkthrough of what happens in the test program: There are 3 threads: 1 forks off short-lived child processes, 2 reaps the child processes, and 3 tracks the child processes via a kqueue (NOTE_EXIT | NOTE_EXEC | NOTE_FORK | NOTE_TRACK). I believe a crash generally looks like this: 1. Forker thread creates a short lived child. That child dies and triggers a NOTE_EXIT event. 2. Kqueue thread is somewhere in kqueue_scan(), probably blocked at a KN_LIST_LOCK call. 3. The dying process calls into filt_proc() and notices that the KN_DETACHED flag is not set. It therefore decides to call knlist_remove_inevent() to take the knote out of the knlist. Importantly, this sets kn->kn_knlist to NULL, meaning we can no longer access the knlist lock from the knote. 4. Kqueue thread, still in kqueue_scan(), is able to acquire the lock via KN_LIST_LOCK. It does some work and then calls the KN_LIST_UNLOCK macro. This macro checks and finds that the knote does not have a reference to a knlist, and thus takes no action, leaving the lock in the locked state. I believe there’s also a small window where the KN_LIST_LOCK macro checks kn->kn_knlist and finds it to be non-NULL, but by the time it actually dereferences it, it has become NULL. This would produce the “page fault while in kernel mode” crash. If someone familiar with this code sees an obvious fix, I’ll be happy to test it. Otherwise, I’d appreciate any advice on fixing this. My first thought is that a ‘struct knote’ ought to have its own mutex for controlling access to the flag fields and ideally the “kn_knlist” field. I.e., you would first acquire a knote’s lock and then the knlist lock, thus ensuring that no one could clear the kn_knlist variable while you hold the knlist lock. The knlist lock, however, usually comes from whichever event producing entity the knote tracks, so getting lock ordering right between the per-knote mutex and this other lock seems potentially hard. (Sometimes we call into functions in kern_event.c with the knlist lock already held, having been acquired in code outside of kern_event.c. Consider, for example, calling KNOTE_LOCKED from kern_exit.c; the PROC_LOCK macro has already been used to acquire the process lock, also serving as the knlist lock). Apropos of the knlist lock and its provenance: why is a lock from the event producing entity used to control access to the knlist and knote? Is it generally desirable to, for example, hold the process lock while operating on a knlist attached to that process? It’s not obvious to me that this is required or even desirable. This might suggest that a knlist should have its own lock rather than using a lock from the event producing entity, which might make addressing this problem more straightforward. Many thanks for any help, and please let me know if I’ve failed to make anything clear. Cheers, EricReceived on Wed Jun 15 2016 - 01:29:07 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:05 UTC