Re: serious networking (em) performance (ggate and NFS) problem

From: Emanuel Strobl <Emanuel.Strobl_at_gmx.net>
Date: Fri, 19 Nov 2004 15:10:17 +0100
Am Freitag, 19. November 2004 13:56 schrieb Robert Watson:
> On Fri, 19 Nov 2004, Emanuel Strobl wrote:
> > Am Donnerstag, 18. November 2004 13:27 schrieb Robert Watson:
> > > On Wed, 17 Nov 2004, Emanuel Strobl wrote:
> > > > I really love 5.3 in many ways but here're some unbelievable transfer
[...]
> Well, the claim that if_em doesn't benefit from polling is inaccurate in
> the general case, but quite accurate in the specific case.  In a box with
> multiple NIC's, using polling can make quite a big difference, not just by
> mitigating interrupt load, but also by helping to prioritize and manage
> the load, preventing live lock.  As I indicated in my earlier e-mail,

I understand, thanks for the explanation

> It looks like the netperf TCP test is getting just under 27MB/s, or
> 214Mb/s.  That does seem on the low side for the PCI bus, but it's also

Nut sure if I understand that sentence correctly, does it mean the "slow" 
400MHz PII is causing this limit? (low side for the PCI bus?)

> instructive to look at the netperf UDP_STREAM results, which indicate that
> the box believes it is transmitting 417Mb/s but only 67Mb/s are being
> received or processed fast enough by netserver on the remote box.  This
> means you've achieved a send rate to the card of about 54Mb/s.  Note that
> you can actually do the math on cycles/packet or cycles/byte here -- with
> TCP_STREAM, it looks like some combination of recipient CPU and latency
> overhead is the limiting factor, with netserver running at 94% busy.

Hmm, I can't puzzle a picture out of this. 

>
> Could you try using geom gate to export a malloc-backed md device, and see
> what performance you see there?  This would eliminate the storage round

It's a pleasure:

test2:~#15: dd if=/dev/zero of=/mdgate/testfile bs=16k count=6000
6000+0 records in
6000+0 records out
98304000 bytes transferred in 5.944915 secs (16535812 bytes/sec)
test2:~#17: dd if=/mdgate/testfile of=/dev/null bs=16k
6000+0 records in
6000+0 records out
98304000 bytes transferred in 5.664384 secs (17354755 bytes/sec)

This time it's no difference between disk and memory filesystem, but on 
another machine with a ich2 chipset and a 3ware controller (my current 
productive system which I try to replace with this project) there was a big 
difference. Attached is the corresponding message.

Thanks,

-Harry

> trip and guarantee the source is in memory, eliminating some possible
> sources of synchronous operation (which would increase latency, reducing
> throughput).  Looking at CPU consumption here would also be helpful, as it
> would allow us to reason about where the CPU is going.
>
> > I was aware of that and because of lacking a GbE switch anyway I decided
> > to use a simple cable ;)
>
> Yes, this is my favorite configuration :-).
>
> > > (5) Next, I'd measure CPU consumption on the end box -- in particular,
> > > use top -S and systat -vmstat 1 to compare the idle condition of the
> > > system and the system under load.
> >
> > I additionally added these values to the netperf results.
>
> Thanks for your very complete and careful testing and reporting :-).
>
> Robert N M Watson             FreeBSD Core Team, TrustedBSD Projects
> robert_at_fledge.watson.org      Principal Research Scientist, McAfee Research
>
> _______________________________________________
> freebsd-current_at_freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org"

attached mail follows:


Am Dienstag, 2. November 2004 19:56 schrieb Doug White:
> On Tue, 2 Nov 2004, Robert Watson wrote:
> > On Tue, 2 Nov 2004, Emanuel Strobl wrote:
> > > It's a IDE Raid controller (3ware 7506-4, a real one) and the file is
> > > indeed huge, but not abnormally. I have a harddisk video recorder, so I
> > > have lots of 700MB files. Also if I copy my photo collection from the
> > > server it takes 5 Minutes but copying _to_ the server it takes almost
> > > 15 Minutes and the average file size is 5 MB. Fast Ethernet isn't
> > > really suitable for my needs, but at least the 10MB/s should be
> > > reached. I can't imagine I get better speeds when I upgrade to GbE,
> > > (which the important boxes are already, just not the switch) because
> > > NFS in it's current state isn't able to saturate a 100baseTX line, at
> > > least in one direction. That's the real anstonishing thing for me. Why
> > > does reading staurate 100BaseTX but writes only a third?
> >
> > Have you tried using tcpdump/ethereal to see if there's any significant
> > packet loss (for good reasons or not) going on?  Lots of RPC retransmits
> > would certainly explain the lower performance, and if that's not it, it
> > would be good to rule out.  The traces might also provide some insight
> > into the specific I/O operations, letting you see what block sizes are in
> > use, etc.  I've found that dumping to a file with tcpdump and reading
> > with ethereal is a really good way to get a picture of what's going on
> > with NFS: ethereal does a very nice job decoding the RPCs, as well as
> > figuring out what packets are related to each other, etc.
>
> It'd also be nice to know the mount options (nfs blocksizes in
> particular).

I haven't done intensive wire-dumps yet, but I figured out some oddities.
My main problem seems to be the 3ware controller in combination with NFS. If I 
create a malloc backed md0 I can push more than 9MB/s to it with UDP and more 
that 10MB/s with TCP (both without modifying r/w-size).
I can also copy a 100M file from twed0s1d to twed0s1e (so from and to the same 
RAID5 array which is worst rate) with 15MB/s so the array can't be the 
bottleneck.
Only when I push to the RAID5 array via NFS I only get 4MB/s, no matter if I 
use UDP, TCP or nonstandard r/w-sizes.

Next thing I found is that if I tune -w to anything higher than the standard 
8192 the average transfer rate of one big file degrades with UDP but 
increases with TCP (like I would expect).
UDP transfer seems to hic-up with -w tuned, transfer rates peak at 8MB/s but 
the next second they stay at 0-2MB/s (watched with systat -vm 1) but with TCP 
everything runs smooth, regardless of the -w value.

Now back to my real problem: Can you imagine that NFS and twe are blocking 
each other or something like that? Why do I get such really bad transfer 
rates when both parts are in use but every single part on its own seems to 
work fine?

Thanks for any help,

-Harry

Received on Fri Nov 19 2004 - 13:10:32 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:22 UTC