Re: zfs deadlock on r360452 relating to busy vm page

From: Mark Johnston <markj_at_freebsd.org> Date: Wed, 13 May 2020 10:42:57 -0400 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:24 UTC

On Wed, May 13, 2020 at 10:45:24AM +0300, Andriy Gapon wrote:
> On 13/05/2020 10:35, Andriy Gapon wrote:
> > On 13/05/2020 01:47, Bryan Drewery wrote:
> >> Trivial repro:
> >>
> >> dd if=/dev/zero of=blah & tail -F blah
> >> ^C
> >> load: 0.21  cmd: tail 72381 [prev->lr_read_cv] 2.17r 0.00u 0.01s 0% 2600k
> >> #0 0xffffffff80bce615 at mi_switch+0x155
> >> #1 0xffffffff80c1cfea at sleepq_switch+0x11a
> >> #2 0xffffffff80b57f0a at _cv_wait+0x15a
> >> #3 0xffffffff829ddab6 at rangelock_enter+0x306
> >> #4 0xffffffff829ecd3f at zfs_freebsd_getpages+0x14f
> >> #5 0xffffffff810e3ab9 at VOP_GETPAGES_APV+0x59
> >> #6 0xffffffff80f349e7 at vnode_pager_getpages+0x37
> >> #7 0xffffffff80f2a93f at vm_pager_get_pages+0x4f
> >> #8 0xffffffff80f054b0 at vm_fault+0x780
> >> #9 0xffffffff80f04bde at vm_fault_trap+0x6e
> >> #10 0xffffffff8106544e at trap_pfault+0x1ee
> >> #11 0xffffffff81064a9c at trap+0x44c
> >> #12 0xffffffff8103a978 at calltrap+0x8
> > 
> > In r329363 I re-worked zfs_getpages and introduced range locking to it.
> > At the time I believed that it was safe and maybe it was, please see the commit
> > message.
> > There, indeed, have been many performance / concurrency improvements to the VM
> > system and r358443 is one of them.
> 
> Thinking more about it, it could be r352176.
> I think that vm_page_grab_valid (and later vm_page_grab_valid_unlocked) are not
> equivalent to the code that they replaced.
> The original code would check valid field before any locking and it would
> attempt any locking / busing if a page is invalid.  The object was required to
> be locked though.
> The new code tries to busy the page in any case.
> 
> > I am not sure how to resolve the problem best.  Maybe someone who knows the
> > latest VM code better than me can comment on my assumptions stated in the commit
> > message.

The general trend has been to use the page busy lock as the single point
of synchronization for per-page state.  As you noted, updates to the
valid bits were previously interlocked by the object lock, but this is
coarse-grained and hurts concurrency.  I think you are right that the
range locking in getpages was ok before the recent change, but it seems
preferable to try and address this in ZFS.

> > In illumos (and, I think, in OpenZFS/ZoL) they don't have the range locking in
> > this corner of the code because of a similar deadlock a long time ago.

Do they just not implement readahead?  Can you explain exactly what the
range lock accomplishes here - is it entirely to ensure that znode block
size remains stable?