nfsd problems with FreeBSD 5.2.1

From: Mike Thomas <mwt_at_cems.umn.edu> Date: Fri, 16 Jul 2004 12:50:53 -0500 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:02 UTC

Hello,

Alright folks, I'm in some serious need for help/advice.

I'm running FreeBSD 5.2.1 (-current) with a kernel/buildworld ran 
yesterday (7/16/2004) on a Dual Xeon 3.06ghz with hyperthreading 
enabled. The machine also has 2gb of ram and a scsi raided array with an 
intel storage raid array controller. (iir0)

The machine functions as a nis client for accounts with home directories 
nfs mounted from a Solaris 9 machine. It's primary function is as a mail 
server, and what it is nfs sharing out is the spool folder. (/var/mail, 
in this case).

I know all about the dangers of sharing out a mail spool, I don't need, 
or want, a lecture about proper operating procedures in this case. It's 
for legacy purposes and will be going away in due time. Anyway, its with 
this mount that I am experiencing these nfs problems.

Now, to the nitty gritty. I am seeing periodic spikes from one of the 
nfsd children from about 10% of the cpu (via top) to 100% of the cpu. 
During times of this spike, even if the spike only reaches 40-50% of the 
cpu, the machine becomes dibilitatingly slow and stops responding to all 
other commands. Even issuing an 'ls' is difficult, let alone doing 
anything productive. While using top, the nfsd state will alternate 
between biowr, biord, *Giant (yeah, it even is requesting Giant locks). 
I have recompiled every single ounce of software that operates on 
/var/mail to only use fcntl locking (procmail/postfix/uw-imap (there's a 
patch by redhat to do that)) so that it is nfs friendly.

Here's what I've tried to do to see if it made any difference. First, 
all mounts of /var/mail from other servers were using UDP, they have all 
been switched to tcp with a rsize and wsize of 1024. I've tried 4096, 
and 8192, both which make no difference. All clients are specifically 
forced to use NFSv3. I have also tried varying between a soft and hard 
mount, also, with no difference in these spikes.

I also tried switching back to the 4BSD scheduler, to see if that might 
have beeen the issue, but it would appear that didn't make any 
difference as well, though the max load average I was seeing stayed a 
bit lower with ULE as upposed to the 4BSD scheduler.

So, I'm really at the end of my rope right now, I have no idea what to 
do or what could be causing this. Any advice would be great, thanks.

--Mike