Re: r367672 broke the NFS server

From: Konstantin Belousov <kostikbel_at_gmail.com>
Date: Wed, 30 Dec 2020 19:27:08 +0200
On Wed, Dec 30, 2020 at 04:48:27PM +0000, Rick Macklem wrote:
> Kostik wrote:
> >On Wed, Dec 30, 2020 at 02:02:48AM +0000, Rick Macklem wrote:
> >> Hi,
> >>
> >> Post r367671...
> >> When multiple files are being created by an NFS client in the same
> >> directory, the VOP_CREATE()/ufs_create() can fail with ERELOOKUP.
> >> This results in a EIO return to the NFS client.
> >> --> This causes "nfsv4 client/server protocol prob err=10026"
> >>       on the client for NFSv4.0 mounts.
> >>       --> This explains why this error has been reported by
> >>             several people lately, although it should "never happen".
> >>
> >> Unfortunately, for the NFS server, the Lookup call is done separately
> >> and it will not be easy to redo it, given the current NFS code structure.
> >>
> >> Is there another way to deal with the problem r367672 was fixing that
> >> avoids ufs_create() returning ERELOOKUP?
> >
> >Idea of the change is to restart the syscall at top level.  So for NFS
> >server the right approach is to not send a response and also to not
> >free the request mbuf chain, but to restart processing.
> Yes. I took a look and I think restarting the operation by rolling the
> working position in the mbuf lists back and redoing the operation
> is feasible and easier than fixing the individual operations.
> 
> For NFSv4, you cannot redo the entire compound, since non-idempotent
> operations like exclusive open may have already been completed.
> However, rolling back to the beginning of the operation should be
> doable.
> --> It will serve as a good test, in that it may expose bugs in the
>       RPC/operation code where failure (ERELOOKUP) doesn't clean
>       things up correctly.
>       --> In NFSv4, there is the open/lock state that cannot be updated
>             for this error case. (The seqid stuff in NFSv4.0 Open can be fun.
>             Its used to serialize the operations and the number must be
>             incremented for some errors, but not for others. The 10026
>             error occurs when you don't get this right.)
Note that ERELOOKUP error can only show up from the VOPs that modify the volume.
Otherwise we simply do not call into SU.  In particular, I believe that opens
in the sense of NFS are safe.

Regardless of it, there should be either a catch-all check for ERELOOKUP,
or assert that ERELOOKUP did not leaked, as it is done for syscalls

> 
> I'll start working on this to-day, but I have no idea how long it might
> take?
> 
> >I am sorry I forgot about NFS server when designing this fix, the only
> >mild excuse I can provide is that the change was quite complicated as is.
> >I will start looking at the fix.
> No problem. Sometimes I'd like to forget about NFS too;-).
> 
> For the rollback/redo the RPC/operation case, it's probably easier for me
> to do it. As above, I'll start on it, but...
> 
> My main concern is how long it will take, given the FreeBSD13 release
> starts soon.
For sure I will help you if needed, and I believe that we could ask for
testing from Peter.
Received on Wed Dec 30 2020 - 16:27:17 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:26 UTC