Re: Linux NFS ate my bge

From: Chris Hedley <freebsd-current_at_chrishedley.com>
Date: Mon, 27 Jul 2009 02:05:11 +0100 (BST)
On Wed, 22 Jul 2009, Matthew Dillon wrote:

>    TCP will likely work better, for several reasons, not the least of
>    which being that the NFS client does not have to estimate a retransmit
>    timeout on a rpc-by-rpc basis.  Such estimations fail utterly in the
>    face of a large number of concurrent RPCs because latency winds up being
>    governed by the disk backlog on the server.  A UDP mount will wind up
>    retransmitting even under completely lossless conditions.
>
>    Another reason TCP tends to work better is that UDP uses IP fragmentation
>    and IP fragmentation reassembly is not typically in the critical path.
>    The desired NFS filesystem block size is 16K (smaller will typically
>    reduce performance), so even a 9000 MTU won't help.

It's interesting how this flies in the face of the assumptions I'd made: 
I'd just guessed that UDP would somehow be the better option, I think I'd 
had some vague idea it might somehow be more suited to fragmented file 
chunks being squirted over the network and TCP being a compromise.

Well my somewhat wonky assumptions aside, changing over to TCP seems to 
have fixed it: I haven't seen the problem rematerialise even with a much 
more protracted network loading than before (essentially emerging [I use 
Gentoo for Linux] an update of pretty much everything).  The performance 
is better; it could still do with some serious improvement as it's a lot 
more sluggish than is ideal, though I suspect that the fault lies at the 
Linux end.  Though it may be my configuration options.

Of course it took me two attempts to get TCP configured: I'd completely 
forgotten that I can't simply change it in fstab (I wasn't having a good 
day when it came to being insightful!) and had to change the entry in the 
pxelinux config to tell it to use TCP.  But I got there in the end, so 
thank you. :)

>    Also use netstat ... not sure what option, I think -x, to determine the
>    actual size of the socket buffer being employed for the connection
>    (TCP or UDP).  There are multiple internal caps in the kernel and it
>    is often not as big as you might have thought it should be.  You want
>    a 256KB socket buffer at a minimum for a GigE network.  Smaller works
>    (at least for linear transfers), but you lose a lot of RPC concurrency
>    from the client.  Again, something that matters more for a linux client
>    vs a FreeBSD client.

I think this will be my next port of call in order to hopefully get the 
performance up to a better standard, but there's time for experimenting 
with that.  For now, I'm just happy that my FreeBSD system no longer locks 
up when being bombarded with requests!

Cheers,

Chris.
Received on Sun Jul 26 2009 - 23:05:30 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:52 UTC