Re: r367672 broke the NFS server

From: Peter Holm <peter_at_holm.cc>
Date: Wed, 30 Dec 2020 19:23:13 +0100
On Wed, Dec 30, 2020 at 07:27:08PM +0200, Konstantin Belousov wrote:
> On Wed, Dec 30, 2020 at 04:48:27PM +0000, Rick Macklem wrote:
> > Kostik wrote:
> > >On Wed, Dec 30, 2020 at 02:02:48AM +0000, Rick Macklem wrote:
> > >> Hi,
> > >>
> > >> Post r367671...
> > >> When multiple files are being created by an NFS client in the same
> > >> directory, the VOP_CREATE()/ufs_create() can fail with ERELOOKUP.
> > >> This results in a EIO return to the NFS client.
> > >> --> This causes "nfsv4 client/server protocol prob err=10026"
> > >>       on the client for NFSv4.0 mounts.
> > >>       --> This explains why this error has been reported by
> > >>             several people lately, although it should "never happen".
> > >>
> > >> Unfortunately, for the NFS server, the Lookup call is done separately
> > >> and it will not be easy to redo it, given the current NFS code structure.
> > >>
> > >> Is there another way to deal with the problem r367672 was fixing that
> > >> avoids ufs_create() returning ERELOOKUP?
> > >
> > >Idea of the change is to restart the syscall at top level.  So for NFS
> > >server the right approach is to not send a response and also to not
> > >free the request mbuf chain, but to restart processing.
> > Yes. I took a look and I think restarting the operation by rolling the
> > working position in the mbuf lists back and redoing the operation
> > is feasible and easier than fixing the individual operations.
> > 
> > For NFSv4, you cannot redo the entire compound, since non-idempotent
> > operations like exclusive open may have already been completed.
> > However, rolling back to the beginning of the operation should be
> > doable.
> > --> It will serve as a good test, in that it may expose bugs in the
> >       RPC/operation code where failure (ERELOOKUP) doesn't clean
> >       things up correctly.
> >       --> In NFSv4, there is the open/lock state that cannot be updated
> >             for this error case. (The seqid stuff in NFSv4.0 Open can be fun.
> >             Its used to serialize the operations and the number must be
> >             incremented for some errors, but not for others. The 10026
> >             error occurs when you don't get this right.)
> Note that ERELOOKUP error can only show up from the VOPs that modify the volume.
> Otherwise we simply do not call into SU.  In particular, I believe that opens
> in the sense of NFS are safe.
> 
> Regardless of it, there should be either a catch-all check for ERELOOKUP,
> or assert that ERELOOKUP did not leaked, as it is done for syscalls
> 
> > 
> > I'll start working on this to-day, but I have no idea how long it might
> > take?
> > 
> > >I am sorry I forgot about NFS server when designing this fix, the only
> > >mild excuse I can provide is that the change was quite complicated as is.
> > >I will start looking at the fix.
> > No problem. Sometimes I'd like to forget about NFS too;-).
> > 
> > For the rollback/redo the RPC/operation case, it's probably easier for me
> > to do it. As above, I'll start on it, but...
> > 
> > My main concern is how long it will take, given the FreeBSD13 release
> > starts soon.
> For sure I will help you if needed, and I believe that we could ask for
> testing from Peter.

Absolutely.
Not sure how I missed running NFS test the first time around.

- Peter

> _______________________________________________
> freebsd-current_at_freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org"
Received on Wed Dec 30 2020 - 17:23:15 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:26 UTC