Re: NewNFS vs. oldNFS for 10.0?

From: Rick Macklem <rmacklem_at_uoguelph.ca> Date: Fri, 15 Mar 2013 10:08:40 -0400 (EDT) · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:35 UTC

Lars Eggert wrote:
> Hi,
> 
> this reminds me that I ran into an issue lately with the new NFS and
> locking for NFSv3 mounts on a client that ran -CURRENT and a server
> that ran -STABLE.
> 
> When I ran "portmaster -a" on the client, which mounted /usr/ports and
> /usr/local, as well as the location of the respective sqlite databases
> over NFSv3, the client network stack became unresponsive on all
> interfaces for 30 or so seconds and e.g. SSH connections broke. The
> serial console remained active throughout, and the system didn't
> crash. About a minute after the wedgie I could SSH into the box again,
> too.
> 
> The issue went away when I killed lockd on the client, but that caused
> the sqlite database to become corrupted over time. The workaround for
> me was to move to NFSv4, which has been working fine. (One more reason
> to make it the default...)
> 
I've mentioned limitations w.r.t. the design of the NLM protocol (rpc.lockd)
before. Any time there is any kind of network topology issue, it will run
into difficulties. There may also be other issues.

However, since both the old and new client use the same rpc.lockd in the
same way (the new one just cribbed the code from the old one), I think
the same problem would exist for the old one. As such, I don't believe
this is a regression.

rick

> I'm not really sure how to debug this further, but would be willing to
> work with someone off-list who'd tell me what tests to run.
> 
> Lars