Re: r367672 broke the NFS server

From: Rick Macklem <rmacklem_at_uoguelph.ca> Date: Thu, 31 Dec 2020 22:56:04 +0000 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:26 UTC

Just fyi, I have  put a patch up on phabricator as D27875 that seems
to fix the problem for all NFS client mounts except NFSv4.0.
NFSv4.0 will require an additional fix so that the "seqid" is
properly maintained during redos of the Open caused by
the ERELOOKUP redo.

If anyone is running a recent head kernel on a system that
NFS exports UFS file systems, please test this patch.

Peter, can you test this?

If acquiring the patch from phabricator is awkward,
just email and I will send you a copy of the patch.

Thanks, rick
ps: If possible, I'd like to commit this patch in a
     couple of days, given the FreeBSD release schedule.

________________________________________
From: owner-freebsd-current_at_freebsd.org <owner-freebsd-current_at_freebsd.org> on behalf of Konstantin Belousov <kostikbel_at_gmail.com>
Sent: Thursday, December 31, 2020 6:40 AM
To: Rick Macklem
Cc: freebsd-current_at_freebsd.org; Alan Somers; Kirk McKusick; Mark Johnston
Subject: Re: r367672 broke the NFS server

CAUTION: This email originated from outside of the University of Guelph. Do not click links or open attachments unless you recognize the sender and know the content is safe. If in doubt, forward suspicious emails to IThelp_at_uoguelph.ca

On Thu, Dec 31, 2020 at 05:16:27AM +0000, Rick Macklem wrote:
> Rick Macklem wrote:
> >Kostik wrote:
> > >
> > >Idea of the change is to restart the syscall at top level.  So for NFS
> > >server the right approach is to not send a response and also to not
> > >free the request mbuf chain, but to restart processing.
> > Yes. I took a look and I think restarting the operation by rolling the
> > working position in the mbuf lists back and redoing the operation
> > is feasible and easier than fixing the individual operations.
> >
> > For NFSv4, you cannot redo the entire compound, since non-idempotent
> > operations like exclusive open may have already been completed.
> > However, rolling back to the beginning of the operation should be
> > doable.
> Turned out to be quite easy. I'll stick a patch up on phabricator
> tomorrow, after I do a little more testing.
> NFSv4.0 is still broken, because it screws up the seqid, but I can
> fix that separately.
>
> I do see the code looping about 2-3 times before it gets a successful
> ufs_create(). Does that sound reasonable?
In the simple case, it could be described as is: ERELOOKUP is returned
if the parent directory cannot be locked sleep-less, and we have to drop
the lock for opened vnode to sleep on it. More elaborate (but still
not precise) description is that parent directory might also need to
be synced, in which case its parent might need to be locked, and so on
recursively.

Slightly reformulating, I expect that ERELOOKUPs come out in case several
threads create files in the same directory.

> Here's some debug printfs for the test run of 4 concurrent compiles.
> (proc=8 is create and proc=12 is remove. Each line is a ERELOOKUP
>  retry. This is for the 4 threads, but I had the thread tid in another printf
>  and it showed 2-3 attempts for the same thread. They should be serialized
>  by the exclusive lock on the directory vnode.)
I cannot make any conclusion from the output and its description.
Are there opens that do not result in ERELOOKUP, i.e. does the op
eventually succeed ?  What is the ratio of ERELOOKUP vs. success ?

Also note that any VOP that modify the volume' metadata might result
in ERELOOKUP.

> tryag3 stat=0 proc=8
> tryag3 stat=0 proc=8
_______________________________________________
freebsd-current_at_freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org"