Re: vge traffic problem

From: Pyun YongHyeon <pyunyh_at_gmail.com> Date: Tue, 12 Jan 2010 09:28:54 -0800 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:00 UTC

On Mon, Jan 11, 2010 at 08:29:27PM -0800, David Ehrmann wrote:
> Pyun YongHyeon wrote:
> >It seems iperf on FreeBSD was broken. It incorrectly generates
> >huge-packet with IP length 0 so other host disconnected the
> >TCP connection. Not sure it could be related with threading though.
> >Use netperf instead, it would be more reliable than iperf.
> >  
> I saw a lot of warnings when I opened the cap file in Wireshark about 
> the length in the IP header being wrong.  I'll start looking into netperf
> 

Yeah, there must be a bug in iperf thread handling. Maybe default
configuration of iperf should be changed until iperf bug is fixed.

> >It's normal see some dropped frames under high network load. And
> >you can't compare gigabit controller to fast ethernet controller.
> >  
> Very true, and that's why I tried a lower load.  I was a little 
> surprised to see it choking at just 1 Mb/s (that's bits, not bytes), though.

Even though vge(4) is not one of the best controller you can still
get more than 650Mbps TX, 920Mbps RX for bulk TCP transfers. For
smaller sized TCP segments the number would be much lower than that
but that's normal for virtually all controllers. I have a local
patch which pushes TX performance numbers up to 800Mbps for vge(4)
but it requires fast CPU to do that so I'm not sure whether I want
to put it to tree or not. Since I didn't ever see this low TX
performance numbers on PCIe based controllers there could be
incorrectly programmed registers but datasheet said nothing about
this issue.

> >I have exact the same revision of the hardware and I don't have
> >encountered your issue here. Instead of measuring performance
> >number with broken iperf, check whether you still get
> >"Connection reset by peer" message with csup(1) when you use vge(4)
> >interface. If you still see the message, please send me tcpdump
> >capture in private.
> >  
> csup is still working.
> 
> I actually think I *might* have the problem solved.  Switching the mount 
> from UDP (why it was the default for NFS in this Linux distro, I don't 
> know) to TCP seems to have fixed it.  My guess is that some sort of race 
> condition occurred or there's a bug in someone's NFS flow control 
> mechanism.  A 10x CPU and network performance difference must be more 
> than is usually tested.  I hope.
> 

This could be related with NFS. If NFS over TCP works without
problems on vge(4) it's more likely NFS issue.

> I'll keep testing NFS over TCP and see if it fixed my problem.