Possible deadlock on IO / page fault

From: Michael Zhilin <mizhka_at_freebsd.org> Date: Tue, 29 Sep 2020 14:59:43 +0300 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:25 UTC

Hi,

I'm using FreeBSD 13-CURRENT (pre-ZoF, r359724) on my laptop with installed
Gnome. Sometimes
(once a week/month) gnome hangs and the system may be still responsible
(may be not).
This week it happened again and I've gathered information via ddb/textdump
and rebooted laptop.

gnome-shell is trying to get exclusive lock on some directory according to
information
from "show alllocks" and "bt":

    Process 2355 (gnome-shell) thread 0xfffffe00b03d4700 (100483)
    exclusive lockmgr zfs (zfs) r = 0 (0xfffff802112fe808) locked _at_
/usr/src/sys/kern/vfs_subr.c:2930

    Tracing command gnome-shell pid 2355 tid 100483 td 0xfffffe00b03d4700
    sched_switch() at sched_switch+0x5b2/frame 0xfffffe00aeff7220
    mi_switch() at mi_switch+0x155/frame 0xfffffe00aeff7240
    sleepq_switch() at sleepq_switch+0x11a/frame 0xfffffe00aeff7280
    sleeplk() at sleeplk+0x106/frame 0xfffffe00aeff72e0
    lockmgr_xlock_hard() at lockmgr_xlock_hard+0x21f/frame
0xfffffe00aeff7380
    _vn_lock() at _vn_lock+0x54/frame 0xfffffe00aeff73e0
    zfs_lookup() at zfs_lookup+0x5f7/frame 0xfffffe00aeff74c0
    zfs_freebsd_cachedlookup() at zfs_freebsd_cachedlookup+0x8e/frame
0xfffffe00aeff7600
    vfs_cache_lookup() at vfs_cache_lookup+0xa8/frame 0xfffffe00aeff7650
    lookup() at lookup+0x5e1/frame 0xfffffe00aeff76f0
    namei() at namei+0x524/frame 0xfffffe00aeff77e0
    vn_open_cred() at vn_open_cred+0x10b/frame 0xfffffe00aeff7930
    kern_openat() at kern_openat+0x1fa/frame 0xfffffe00aeff7aa0
    filemon_wrapper_openat() at filemon_wrapper_openat+0x15/frame
0xfffffe00aeff7ad0
    amd64_syscall() at amd64_syscall+0x140/frame 0xfffffe00aeff7bf0
    fast_syscall_common() at fast_syscall_common+0x101/frame
0xfffffe00aeff7bf0
    --- syscall (499, FreeBSD ELF64, filemon_wrapper_openat), rip =
0x80148dbca, rsp = 0x7fffffffe248, rbp = 0x7fffffffe2c0 ---

Line vfs_subr.c:2930 mentioned by lockmgr is part of vget_finish. I've
found blocked stack
with shared lock request:

    Tracing command gsd-color pid 2422 tid 100784 td 0xfffffe00b08e0700
    sched_switch() at sched_switch+0x5b2/frame 0xfffffe00b07084f0
    mi_switch() at mi_switch+0x155/frame 0xfffffe00b0708510
    sleepq_switch() at sleepq_switch+0x11a/frame 0xfffffe00b0708550
    sleeplk() at sleeplk+0x106/frame 0xfffffe00b07085b0
    lockmgr_slock_hard() at lockmgr_slock_hard+0x1ce/frame
0xfffffe00b0708640
    _vn_lock() at _vn_lock+0x54/frame 0xfffffe00b07086a0
    vget_finish() at vget_finish+0x42/frame 0xfffffe00b07086d0
    cache_lookup() at cache_lookup+0x57c/frame 0xfffffe00b0708790
    vfs_cache_lookup() at vfs_cache_lookup+0x7d/frame 0xfffffe00b07087e0
    lookup() at lookup+0x5e1/frame 0xfffffe00b0708880
    namei() at namei+0x524/frame 0xfffffe00b0708970
    kern_accessat() at kern_accessat+0x106/frame 0xfffffe00b0708ad0
    amd64_syscall() at amd64_syscall+0x140/frame 0xfffffe00b0708bf0
    fast_syscall_common() at fast_syscall_common+0x101/frame
0xfffffe00b0708bf0
    --- syscall (33, FreeBSD ELF64, sys_access), rip = 0x800fd0eba, rsp =
0x7fffffffe268, rbp = 0x7fffffffe380 ---

All other locked threads look more interesting. Thread 100747 is trying to
write
into file and waiting for page, but thread 101436 is trying to handle page
fault
and waiting for a shared lock of locked_range due to present writer. I
suppose it's
deadlock caused all troubles.

    Tracing command dconf-service pid 2384 tid 100747 td 0xfffffe00b08cc000
    sched_switch() at sched_switch+0x5b2/frame 0xfffffe00b0d762e0
    mi_switch() at mi_switch+0x155/frame 0xfffffe00b0d76300
    sleepq_switch() at sleepq_switch+0x11a/frame 0xfffffe00b0d76340
    _vm_page_busy_sleep() at _vm_page_busy_sleep+0x110/frame
0xfffffe00b0d76390
    vm_page_acquire_unlocked() at vm_page_acquire_unlocked+0x177/frame
0xfffffe00b0d763f0
    vm_page_grab_valid_unlocked() at vm_page_grab_valid_unlocked+0x51/frame
0xfffffe00b0d76430
    zfs_freebsd_write() at zfs_freebsd_write+0x9b6/frame 0xfffffe00b0d76640
    VOP_WRITE_APV() at VOP_WRITE_APV+0xa7/frame 0xfffffe00b0d76750
    vn_write() at vn_write+0x2a4/frame 0xfffffe00b0d767d0
    vn_io_fault_doio() at vn_io_fault_doio+0x43/frame 0xfffffe00b0d76830
    vn_io_fault1() at vn_io_fault1+0x16c/frame 0xfffffe00b0d76980
    vn_io_fault() at vn_io_fault+0x182/frame 0xfffffe00b0d769f0
    dofilewrite() at dofilewrite+0x81/frame 0xfffffe00b0d76a40
    kern_pwritev() at kern_pwritev+0x62/frame 0xfffffe00b0d76a80
    sys_pwrite() at sys_pwrite+0x8a/frame 0xfffffe00b0d76ad0
    amd64_syscall() at amd64_syscall+0x140/frame 0xfffffe00b0d76bf0
    fast_syscall_common() at fast_syscall_common+0x101/frame
0xfffffe00b0d76bf0
    --- syscall (476, FreeBSD ELF64, sys_pwrite), rip = 0x8007428ca, rsp =
0x7fffffffdf08, rbp = 0x7fffffffdf20 ---

    Tracing command evolution pid 4536 tid 101436 td 0xfffffe00bf484c00
    sched_switch() at sched_switch+0x5b2/frame 0xfffffe00bfd446e0
    mi_switch() at mi_switch+0x155/frame 0xfffffe00bfd44700
    sleepq_switch() at sleepq_switch+0x11a/frame 0xfffffe00bfd44740
    _cv_wait() at _cv_wait+0x15a/frame 0xfffffe00bfd447a0
    rangelock_enter() at rangelock_enter+0x306/frame 0xfffffe00bfd447f0
    zfs_freebsd_getpages() at zfs_freebsd_getpages+0x14f/frame
0xfffffe00bfd448a0
    vnode_pager_getpages() at vnode_pager_getpages+0x37/frame
0xfffffe00bfd448e0
    vm_pager_get_pages() at vm_pager_get_pages+0x4f/frame 0xfffffe00bfd44930
    vm_fault() at vm_fault+0x780/frame 0xfffffe00bfd44a40
    vm_fault_trap() at vm_fault_trap+0x6e/frame 0xfffffe00bfd44a80
    trap_pfault() at trap_pfault+0x1ee/frame 0xfffffe00bfd44ae0
    trap() at trap+0x44c/frame 0xfffffe00bfd44bf0
    calltrap() at calltrap+0x8/frame 0xfffffe00bfd44bf0
    --- trap 0xc, rip = 0x80a55de3f, rsp = 0x7fffffffcc60, rbp =
0x7fffffffcc60 ---

If it happens again, I will try to gather more information about locks,
corresponding
files and directories. For a while I wonder how to reproduce this case. For
instance,
one program has mmapped huge file and reads it infinitely, another program
write random
data in the middle of the file. But it's unclear how to trigger page fault
manually.
As I understand, kernel provides "vm_page_set_invalid" to mark it invalid.
Is there
any way to invalidate a page of backed object from userland?

I will appreciate any advice on how to troubleshoot this issue.

The full output of ddb is here: https://people.freebsd.org/~mizhka/ddb.txt
Thanks!