On Thu, 15 Jan 2004, Dan Nelson wrote: > I think you just told me why my two busiest NFS servers had to be > rebooted a few months ago (one with 440 days of uptime :( ). Does the > mount fail with "mount: Can't assign requested address"? If so, it also > happens on 4.x servers. Currently, they have 214 and 109 open reserved > ports (after 102 and 73 days uptime, respectively), and I'm betting > there are no more than 5 files actually locked on either system. I > wonder if it's just not closing sockets when it's done with them? There are a number of "known bugs/features" in rpc.lockd, but I have to say that this one is new to me. The issues I know about are: (1) There appear to be problems relating to rpc.lockd and/or rpc.statd following client reboots. I've experienced problems between a Solaris file server and a FreeBSD NFSv3 client using locking wherein a client crash/reboot doesn't release the locks. It could be our rpc.statd simply doesn't work...? (2) There is a known problem involving aborted lock requests -- currently, PCATCH is disabled in the kernel tsleep() in the client, because there's no way to signal to the userspace rpc.lockd that a lock "wasn't wanted afterall". If you add PCATCH back, every time you abort a lock request with a signal you leak a lock. The kernel/userspace protocol needs to be expanded a bit so that the abort can be sent to userspace, and userspace then needs to know what to do about it. (3) There seems to be a general failure tolerance issue associated with situations when rpc.lockd gets back a lock acknowledgement for a lock it didn't request. For safety, it should really release the lock, which would mask (1) and sometimes (2). (4) There seem to be some issues with waking up processes waiting on lock requests when the lock arrives. I sent an e-mail about this a while back, and should dig it up along with my lock testing scenarios and document this better. (5) I think there's also a problem with leaking locks when an application requests the lock using O_NONBLOCK; the request is sent out, but bad things happen if the lock is granted. (6) I believe there was also some problem relating to a series of processes waiting for the same lock on the same client, and not all of them eventually getting the lock. I'll dig through my past e-mail and see if I can't dig up the details. Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert_at_fledge.watson.org Senior Research Scientist, McAfee ResearchReceived on Thu Jan 15 2004 - 14:38:16 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:38 UTC