Hi, I'm using FreeBSD 13-CURRENT (pre-ZoF, r359724) on my laptop with installed Gnome. Sometimes (once a week/month) gnome hangs and the system may be still responsible (may be not). This week it happened again and I've gathered information via ddb/textdump and rebooted laptop. gnome-shell is trying to get exclusive lock on some directory according to information from "show alllocks" and "bt": Process 2355 (gnome-shell) thread 0xfffffe00b03d4700 (100483) exclusive lockmgr zfs (zfs) r = 0 (0xfffff802112fe808) locked _at_ /usr/src/sys/kern/vfs_subr.c:2930 Tracing command gnome-shell pid 2355 tid 100483 td 0xfffffe00b03d4700 sched_switch() at sched_switch+0x5b2/frame 0xfffffe00aeff7220 mi_switch() at mi_switch+0x155/frame 0xfffffe00aeff7240 sleepq_switch() at sleepq_switch+0x11a/frame 0xfffffe00aeff7280 sleeplk() at sleeplk+0x106/frame 0xfffffe00aeff72e0 lockmgr_xlock_hard() at lockmgr_xlock_hard+0x21f/frame 0xfffffe00aeff7380 _vn_lock() at _vn_lock+0x54/frame 0xfffffe00aeff73e0 zfs_lookup() at zfs_lookup+0x5f7/frame 0xfffffe00aeff74c0 zfs_freebsd_cachedlookup() at zfs_freebsd_cachedlookup+0x8e/frame 0xfffffe00aeff7600 vfs_cache_lookup() at vfs_cache_lookup+0xa8/frame 0xfffffe00aeff7650 lookup() at lookup+0x5e1/frame 0xfffffe00aeff76f0 namei() at namei+0x524/frame 0xfffffe00aeff77e0 vn_open_cred() at vn_open_cred+0x10b/frame 0xfffffe00aeff7930 kern_openat() at kern_openat+0x1fa/frame 0xfffffe00aeff7aa0 filemon_wrapper_openat() at filemon_wrapper_openat+0x15/frame 0xfffffe00aeff7ad0 amd64_syscall() at amd64_syscall+0x140/frame 0xfffffe00aeff7bf0 fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe00aeff7bf0 --- syscall (499, FreeBSD ELF64, filemon_wrapper_openat), rip = 0x80148dbca, rsp = 0x7fffffffe248, rbp = 0x7fffffffe2c0 --- Line vfs_subr.c:2930 mentioned by lockmgr is part of vget_finish. I've found blocked stack with shared lock request: Tracing command gsd-color pid 2422 tid 100784 td 0xfffffe00b08e0700 sched_switch() at sched_switch+0x5b2/frame 0xfffffe00b07084f0 mi_switch() at mi_switch+0x155/frame 0xfffffe00b0708510 sleepq_switch() at sleepq_switch+0x11a/frame 0xfffffe00b0708550 sleeplk() at sleeplk+0x106/frame 0xfffffe00b07085b0 lockmgr_slock_hard() at lockmgr_slock_hard+0x1ce/frame 0xfffffe00b0708640 _vn_lock() at _vn_lock+0x54/frame 0xfffffe00b07086a0 vget_finish() at vget_finish+0x42/frame 0xfffffe00b07086d0 cache_lookup() at cache_lookup+0x57c/frame 0xfffffe00b0708790 vfs_cache_lookup() at vfs_cache_lookup+0x7d/frame 0xfffffe00b07087e0 lookup() at lookup+0x5e1/frame 0xfffffe00b0708880 namei() at namei+0x524/frame 0xfffffe00b0708970 kern_accessat() at kern_accessat+0x106/frame 0xfffffe00b0708ad0 amd64_syscall() at amd64_syscall+0x140/frame 0xfffffe00b0708bf0 fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe00b0708bf0 --- syscall (33, FreeBSD ELF64, sys_access), rip = 0x800fd0eba, rsp = 0x7fffffffe268, rbp = 0x7fffffffe380 --- All other locked threads look more interesting. Thread 100747 is trying to write into file and waiting for page, but thread 101436 is trying to handle page fault and waiting for a shared lock of locked_range due to present writer. I suppose it's deadlock caused all troubles. Tracing command dconf-service pid 2384 tid 100747 td 0xfffffe00b08cc000 sched_switch() at sched_switch+0x5b2/frame 0xfffffe00b0d762e0 mi_switch() at mi_switch+0x155/frame 0xfffffe00b0d76300 sleepq_switch() at sleepq_switch+0x11a/frame 0xfffffe00b0d76340 _vm_page_busy_sleep() at _vm_page_busy_sleep+0x110/frame 0xfffffe00b0d76390 vm_page_acquire_unlocked() at vm_page_acquire_unlocked+0x177/frame 0xfffffe00b0d763f0 vm_page_grab_valid_unlocked() at vm_page_grab_valid_unlocked+0x51/frame 0xfffffe00b0d76430 zfs_freebsd_write() at zfs_freebsd_write+0x9b6/frame 0xfffffe00b0d76640 VOP_WRITE_APV() at VOP_WRITE_APV+0xa7/frame 0xfffffe00b0d76750 vn_write() at vn_write+0x2a4/frame 0xfffffe00b0d767d0 vn_io_fault_doio() at vn_io_fault_doio+0x43/frame 0xfffffe00b0d76830 vn_io_fault1() at vn_io_fault1+0x16c/frame 0xfffffe00b0d76980 vn_io_fault() at vn_io_fault+0x182/frame 0xfffffe00b0d769f0 dofilewrite() at dofilewrite+0x81/frame 0xfffffe00b0d76a40 kern_pwritev() at kern_pwritev+0x62/frame 0xfffffe00b0d76a80 sys_pwrite() at sys_pwrite+0x8a/frame 0xfffffe00b0d76ad0 amd64_syscall() at amd64_syscall+0x140/frame 0xfffffe00b0d76bf0 fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe00b0d76bf0 --- syscall (476, FreeBSD ELF64, sys_pwrite), rip = 0x8007428ca, rsp = 0x7fffffffdf08, rbp = 0x7fffffffdf20 --- Tracing command evolution pid 4536 tid 101436 td 0xfffffe00bf484c00 sched_switch() at sched_switch+0x5b2/frame 0xfffffe00bfd446e0 mi_switch() at mi_switch+0x155/frame 0xfffffe00bfd44700 sleepq_switch() at sleepq_switch+0x11a/frame 0xfffffe00bfd44740 _cv_wait() at _cv_wait+0x15a/frame 0xfffffe00bfd447a0 rangelock_enter() at rangelock_enter+0x306/frame 0xfffffe00bfd447f0 zfs_freebsd_getpages() at zfs_freebsd_getpages+0x14f/frame 0xfffffe00bfd448a0 vnode_pager_getpages() at vnode_pager_getpages+0x37/frame 0xfffffe00bfd448e0 vm_pager_get_pages() at vm_pager_get_pages+0x4f/frame 0xfffffe00bfd44930 vm_fault() at vm_fault+0x780/frame 0xfffffe00bfd44a40 vm_fault_trap() at vm_fault_trap+0x6e/frame 0xfffffe00bfd44a80 trap_pfault() at trap_pfault+0x1ee/frame 0xfffffe00bfd44ae0 trap() at trap+0x44c/frame 0xfffffe00bfd44bf0 calltrap() at calltrap+0x8/frame 0xfffffe00bfd44bf0 --- trap 0xc, rip = 0x80a55de3f, rsp = 0x7fffffffcc60, rbp = 0x7fffffffcc60 --- If it happens again, I will try to gather more information about locks, corresponding files and directories. For a while I wonder how to reproduce this case. For instance, one program has mmapped huge file and reads it infinitely, another program write random data in the middle of the file. But it's unclear how to trigger page fault manually. As I understand, kernel provides "vm_page_set_invalid" to mark it invalid. Is there any way to invalidate a page of backed object from userland? I will appreciate any advice on how to troubleshoot this issue. The full output of ddb is here: https://people.freebsd.org/~mizhka/ddb.txt Thanks!Received on Tue Sep 29 2020 - 09:59:58 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:25 UTC