Re: zfs deadlock on r360452 relating to busy vm page

From: Andriy Gapon <avg_at_FreeBSD.org>
Date: Wed, 13 May 2020 10:35:43 +0300
On 13/05/2020 01:47, Bryan Drewery wrote:
> Trivial repro:
> 
> dd if=/dev/zero of=blah & tail -F blah
> ^C
> load: 0.21  cmd: tail 72381 [prev->lr_read_cv] 2.17r 0.00u 0.01s 0% 2600k
> #0 0xffffffff80bce615 at mi_switch+0x155
> #1 0xffffffff80c1cfea at sleepq_switch+0x11a
> #2 0xffffffff80b57f0a at _cv_wait+0x15a
> #3 0xffffffff829ddab6 at rangelock_enter+0x306
> #4 0xffffffff829ecd3f at zfs_freebsd_getpages+0x14f
> #5 0xffffffff810e3ab9 at VOP_GETPAGES_APV+0x59
> #6 0xffffffff80f349e7 at vnode_pager_getpages+0x37
> #7 0xffffffff80f2a93f at vm_pager_get_pages+0x4f
> #8 0xffffffff80f054b0 at vm_fault+0x780
> #9 0xffffffff80f04bde at vm_fault_trap+0x6e
> #10 0xffffffff8106544e at trap_pfault+0x1ee
> #11 0xffffffff81064a9c at trap+0x44c
> #12 0xffffffff8103a978 at calltrap+0x8

In r329363 I re-worked zfs_getpages and introduced range locking to it.
At the time I believed that it was safe and maybe it was, please see the commit
message.
There, indeed, have been many performance / concurrency improvements to the VM
system and r358443 is one of them.
I am not sure how to resolve the problem best.  Maybe someone who knows the
latest VM code better than me can comment on my assumptions stated in the commit
message.

In illumos (and, I think, in OpenZFS/ZoL) they don't have the range locking in
this corner of the code because of a similar deadlock a long time ago.

> On 5/12/2020 3:13 PM, Bryan Drewery wrote:
>>> panic: deadlres_td_sleep_q: possible deadlock detected for 0xfffffe25eefa2e00 (find), blocked for 1802392 ticks
...
>>> (kgdb) backtrace
>>> #0  sched_switch (td=0xfffffe255eac0000, flags=<optimized out>) at /usr/src/sys/kern/sched_ule.c:2147
>>> #1  0xffffffff80bce615 in mi_switch (flags=260) at /usr/src/sys/kern/kern_synch.c:542
>>> #2  0xffffffff80c1cfea in sleepq_switch (wchan=0xfffff810fb57dd48, pri=0) at /usr/src/sys/kern/subr_sleepqueue.c:625
>>> #3  0xffffffff80b57f0a in _cv_wait (cvp=0xfffff810fb57dd48, lock=0xfffff80049a99040) at /usr/src/sys/kern/kern_condvar.c:146
>>> #4  0xffffffff82434ab6 in rangelock_enter_reader (rl=0xfffff80049a99018, new=0xfffff8022cadb100) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_rlock.c:429
>>> #5  rangelock_enter (rl=0xfffff80049a99018, off=<optimized out>, len=<optimized out>, type=<optimized out>) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_rlock.c:477
>>> #6  0xffffffff82443d3f in zfs_getpages (vp=<optimized out>, ma=0xfffffe259f204b18, count=<optimized out>, rbehind=0xfffffe259f204ac4, rahead=0xfffffe259f204ad0) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:4500
>>> #7  zfs_freebsd_getpages (ap=<optimized out>) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:4567
>>> #8  0xffffffff810e3ab9 in VOP_GETPAGES_APV (vop=0xffffffff8250a1e0 <zfs_vnodeops>, a=0xfffffe259f2049f0) at vnode_if.c:2644
>>> #9  0xffffffff80f349e7 in VOP_GETPAGES (vp=<unavailable>, m=<optimized out>, count=<unavailable>, rbehind=<unavailable>, rahead=<unavailable>) at ./vnode_if.h:1171
>>> #10 vnode_pager_getpages (object=<optimized out>, m=<optimized out>, count=<unavailable>, rbehind=<unavailable>, rahead=<unavailable>) at /usr/src/sys/vm/vnode_pager.c:743
>>> #11 0xffffffff80f2a93f in vm_pager_get_pages (object=0xfffff806cb637c60, m=0xfffffe259f204b18, count=1, rbehind=<unavailable>, rahead=<unavailable>) at /usr/src/sys/vm/vm_pager.c:305
>>> #12 0xffffffff80f054b0 in vm_fault_getpages (fs=<optimized out>, nera=0, behindp=<optimized out>, aheadp=<optimized out>) at /usr/src/sys/vm/vm_fault.c:1163
>>> #13 vm_fault (map=<optimized out>, vaddr=<optimized out>, fault_type=<optimized out>, fault_flags=<optimized out>, m_hold=<optimized out>) at /usr/src/sys/vm/vm_fault.c:1394
>>> #14 0xffffffff80f04bde in vm_fault_trap (map=0xfffffe25653949e8, vaddr=<optimized out>, fault_type=<optimized out>, fault_flags=0, signo=0xfffffe259f204d04, ucode=0xfffffe259f204d00) at /usr/src/sys/vm/vm_fault.c:589
>>> #15 0xffffffff8106544e in trap_pfault (frame=0xfffffe259f204d40, usermode=<optimized out>, signo=<optimized out>, ucode=<unavailable>) at /usr/src/sys/amd64/amd64/trap.c:821
>>> #16 0xffffffff81064a9c in trap (frame=0xfffffe259f204d40) at /usr/src/sys/amd64/amd64/trap.c:340
>>> #17 <signal handler called>
>>> #18 0x00000000002034fc in ?? ()
...
>>> (kgdb) thread
>>> [Current thread is 8 (Thread 101255)]
>>> (kgdb) backtrace
>>> #0  sched_switch (td=0xfffffe25c8e9bc00, flags=<optimized out>) at /usr/src/sys/kern/sched_ule.c:2147
>>> #1  0xffffffff80bce615 in mi_switch (flags=260) at /usr/src/sys/kern/kern_synch.c:542
>>> #2  0xffffffff80c1cfea in sleepq_switch (wchan=0xfffffe001cbca850, pri=84) at /usr/src/sys/kern/subr_sleepqueue.c:625
>>> #3  0xffffffff80f1de50 in _vm_page_busy_sleep (obj=<optimized out>, m=0xfffffe001cbca850, pindex=<optimized out>, wmesg=<optimized out>, allocflags=21504, locked=false) at /usr/src/sys/vm/vm_page.c:1094
>>> #4  0xffffffff80f241f7 in vm_page_grab_sleep (object=0xfffff806cb637c60, m=<optimized out>, pindex=<optimized out>, wmesg=<optimized out>, allocflags=21504, locked=<error reading variable: Cannot access memory at address 0x0>) at /usr/src/sys/vm/vm_page.c:4326
>>> #5  vm_page_acquire_unlocked (object=0xfffff806cb637c60, pindex=1098, prev=<optimized out>, mp=0xfffffe2717fc6730, allocflags=21504) at /usr/src/sys/vm/vm_page.c:4469
>>> #6  0xffffffff80f24c61 in vm_page_grab_valid_unlocked (mp=0xfffffe2717fc6730, object=0xfffff806cb637c60, pindex=1098, allocflags=21504) at /usr/src/sys/vm/vm_page.c:4645
>>> #7  0xffffffff82440246 in page_busy (vp=0xfffff80571f29500, start=4497408, off=<optimized out>, nbytes=<optimized out>) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:414
>>> #8  update_pages (vp=0xfffff80571f29500, start=4497408, len=32, os=0xfffff8096a277400, oid=2209520, segflg=<optimized out>, tx=<optimized out>) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:482
>>> #9  zfs_write (vp=<optimized out>, uio=<optimized out>, ioflag=0, cr=<optimized out>, ct=<optimized out>) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:1071
>>> #10 zfs_freebsd_write (ap=<optimized out>) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:4838
>>> #11 0xffffffff810e0eaf in VOP_WRITE_APV (vop=0xffffffff8250a1e0 <zfs_vnodeops>, a=0xfffffe2717fc68c8) at vnode_if.c:925
>>> #12 0xffffffff80cb574c in VOP_WRITE (vp=0xfffff80571f29500, uio=0xfffffe2717fc6bb0, ioflag=8323073, cred=<optimized out>) at ./vnode_if.h:413
>>> #13 vn_write (fp=0xfffff8048195e8c0, uio=<optimized out>, active_cred=<optimized out>, flags=<optimized out>, td=<optimized out>) at /usr/src/sys/kern/vfs_vnops.c:894
>>> #14 0xffffffff80cb50c3 in vn_io_fault_doio (args=0xfffffe2717fc6af0, uio=0xfffffe2717fc6bb0, td=0xfffffe25c8e9bc00) at /usr/src/sys/kern/vfs_vnops.c:959
>>> #15 0xffffffff80cb1c8c in vn_io_fault1 (vp=<optimized out>, uio=0xfffffe2717fc6bb0, args=0xfffffe2717fc6af0, td=0xfffffe25c8e9bc00) at /usr/src/sys/kern/vfs_vnops.c:1077
>>> #16 0xffffffff80cafa32 in vn_io_fault (fp=0xfffff8048195e8c0, uio=0xfffffe2717fc6bb0, active_cred=0xfffff80f2cc12708, flags=0, td=<optimized out>) at /usr/src/sys/kern/vfs_vnops.c:1181
>>> #17 0xffffffff80c34331 in fo_write (fp=0xfffff8048195e8c0, uio=0xfffffe2717fc6bb0, active_cred=<unavailable>, flags=<unavailable>, td=0xfffffe25c8e9bc00) at /usr/src/sys/sys/file.h:326
>>> #18 dofilewrite (td=0xfffffe25c8e9bc00, fd=2, fp=0xfffff8048195e8c0, auio=0xfffffe2717fc6bb0, offset=<optimized out>, flags=<optimized out>) at /usr/src/sys/kern/sys_generic.c:564
>>> #19 0xffffffff80c33eb0 in kern_writev (td=0xfffffe25c8e9bc00, fd=2, auio=<optimized out>) at /usr/src/sys/kern/sys_generic.c:491
>>> #20 sys_write (td=0xfffffe25c8e9bc00, uap=<optimized out>) at /usr/src/sys/kern/sys_generic.c:406
>>> #21 0xffffffff8106623d in syscallenter (td=<optimized out>) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:150
>>> #22 amd64_syscall (td=0xfffffe25c8e9bc00, traced=0) at /usr/src/sys/amd64/amd64/trap.c:1161
>>> #23 <signal handler called>
>>> #24 0x000000080043d53a in ?? ()
>>
>> Maybe r358443 is related?
>>
>>
>>> (kgdb) frame 4
>>> #4  0xffffffff80f241f7 in vm_page_grab_sleep (object=0xfffff806cb637c60, m=<optimized out>, pindex=<optimized out>, wmesg=<optimized out>, allocflags=21504, locked=<error reading variable: Cannot access memory at address 0x0>) at /usr/src/sys/vm/vm_page.c:4326
>>> 4326            if (_vm_page_busy_sleep(object, m, m->pindex, wmesg, allocflags,
>>> (kgdb) p *object
>>> $8 = {lock = {lock_object = {lo_name = 0xffffffff8114fa30 "vm object", lo_flags = 627245056, lo_data = 0, lo_witness = 0x0}, rw_lock = 1}, object_list = {tqe_next = 0xfffff806cb637d68, tqe_prev = 0xfffff806cb637b78}, shadow_head = {lh_first = 0x0}, shadow_list = {le_next = 0xffffffffffffffff,
>>>     le_prev = 0xffffffffffffffff}, memq = {tqh_first = 0xfffffe001cbca850, tqh_last = 0xfffffe001cbca860}, rtree = {rt_root = 18446741875168421969}, size = 1099, domain = {dr_policy = 0x0, dr_iter = 0}, generation = 1, cleangeneration = 1, ref_count = 2, shadow_count = 0, memattr = 6 '\006', type = 2 '\002',
>>>   flags = 4096, pg_color = 0, paging_in_progress = {__count = 2}, busy = {__count = 0}, resident_page_count = 1, backing_object = 0x0, backing_object_offset = 0, pager_object_list = {tqe_next = 0x0, tqe_prev = 0x0}, rvq = {lh_first = 0x0}, handle = 0xfffff80571f29500, un_pager = {vnp = {vnp_size = 4499568,
>>>       writemappings = 0}, devp = {devp_pglist = {tqh_first = 0x44a870, tqh_last = 0x0}, ops = 0x0, dev = 0x0}, sgp = {sgp_pglist = {tqh_first = 0x44a870, tqh_last = 0x0}}, swp = {swp_tmpfs = 0x44a870, swp_blks = {pt_root = 0}, writemappings = 0}}, cred = 0x0, charge = 0, umtx_data = 0x0}
>>> (kgdb) frame 5
>>> #5  vm_page_acquire_unlocked (object=0xfffff806cb637c60, pindex=1098, prev=<optimized out>, mp=0xfffffe2717fc6730, allocflags=21504) at /usr/src/sys/vm/vm_page.c:4469
>>> 4469                    if (!vm_page_grab_sleep(object, m, pindex, "pgnslp",
>>> (kgdb) p *m
>>> $9 = {plinks = {q = {tqe_next = 0xffffffffffffffff, tqe_prev = 0xffffffffffffffff}, s = {ss = {sle_next = 0xffffffffffffffff}}, memguard = {p = 18446744073709551615, v = 18446744073709551615}, uma = {slab = 0xffffffffffffffff, zone = 0xffffffffffffffff}}, listq = {tqe_next = 0x0, tqe_prev = 0xfffff806cb637ca8},
>>>   object = 0xfffff806cb637c60, pindex = 1098, phys_addr = 18988408832, md = {pv_list = {tqh_first = 0x0, tqh_last = 0xfffffe001cbca888}, pv_gen = 44682, pat_mode = 6}, ref_count = 2147483648, busy_lock = 1588330502, a = {{flags = 0, queue = 255 '\377', act_count = 0 '\000'}, _bits = 16711680}, order = 13 '\r',
>>>   pool = 0 '\000', flags = 1 '\001', oflags = 0 '\000', psind = 0 '\000', segind = 6 '\006', valid = 0 '\000', dirty = 0 '\000'}
>>
>> Pretty sure this thread is holding the rangelock from zfs_write() that
>> tail is waiting on. So what is this thread (101255) waiting on exactly
>> for? I'm not sure the way to track down what is using vm object
>> 0xfffff806cb637c60. If the tail thread busied the page then they are
>> waiting on each other I guess. If that's true then r358443 removing the
>> write lock on the object in update_pages() could be a problem.
>>
>>
>> Not sure the rest is interesting. I think they are just waiting on the
>> locked vnode but I give it here in case I missed something.


-- 
Andriy Gapon
Received on Wed May 13 2020 - 05:35:49 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:24 UTC