Re: quick summary results with ixgbe (was Re: datapoints on 10G throughput with TCP ?

From: Luigi Rizzo <rizzo_at_iet.unipi.it>
Date: Thu, 8 Dec 2011 11:50:51 +0100
On Thu, Dec 08, 2011 at 12:06:26PM +0200, Daniel Kalchev wrote:
> 
> 
> On 07.12.11 22:23, Luigi Rizzo wrote:
> >
> >Sorry, forgot to mention that the above is with TSO DISABLED
> >(which is not the default). TSO seems to have a very bad
> >interaction with HWCSUM and non-zero mitigation.
> 
> I have this on both sender and receiver
> 
> # ifconfig ix1
> ix1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
>         
> options=4bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,LRO>
>         ether 00:25:90:35:22:f1
>         inet 10.2.101.11 netmask 0xffffff00 broadcast 10.2.101.255
>         media: Ethernet autoselect (autoselect <full-duplex>)
>         status: active
> 
> without LRO on either end
> 
> # nuttcp -t -T 5 -w 128 -v 10.2.101.11
> nuttcp-t: v6.1.2: socket
> nuttcp-t: buflen=65536, nstream=1, port=5001 tcp -> 10.2.101.11
> nuttcp-t: time limit = 5.00 seconds
> nuttcp-t: connect to 10.2.101.11 with mss=1448, RTT=0.051 ms
> nuttcp-t: send window size = 131768, receive window size = 66608
> nuttcp-t: 1802.4049 MB in 5.06 real seconds = 365077.76 KB/sec = 
> 2990.7170 Mbps
> nuttcp-t: host-retrans = 0
> nuttcp-t: 28839 I/O calls, msec/call = 0.18, calls/sec = 5704.44
> nuttcp-t: 0.0user 4.5sys 0:05real 90% 108i+1459d 630maxrss 0+2pf 87706+1csw
> 
> nuttcp-r: v6.1.2: socket
> nuttcp-r: buflen=65536, nstream=1, port=5001 tcp
> nuttcp-r: accept from 10.2.101.12
> nuttcp-r: send window size = 33304, receive window size = 131768
> nuttcp-r: 1802.4049 MB in 5.18 real seconds = 356247.49 KB/sec = 
> 2918.3794 Mbps
> nuttcp-r: 529295 I/O calls, msec/call = 0.01, calls/sec = 102163.86
> nuttcp-r: 0.1user 3.7sys 0:05real 73% 116i+1567d 618maxrss 0+15pf 
> 230404+0csw
> 
> with LRO on receiver
> 
> # nuttcp -t -T 5 -w 128 -v 10.2.101.11
> nuttcp-t: v6.1.2: socket
> nuttcp-t: buflen=65536, nstream=1, port=5001 tcp -> 10.2.101.11
> nuttcp-t: time limit = 5.00 seconds
> nuttcp-t: connect to 10.2.101.11 with mss=1448, RTT=0.067 ms
> nuttcp-t: send window size = 131768, receive window size = 66608
> nuttcp-t: 2420.5000 MB in 5.02 real seconds = 493701.04 KB/sec = 
> 4044.3989 Mbps
> nuttcp-t: host-retrans = 2
> nuttcp-t: 38728 I/O calls, msec/call = 0.13, calls/sec = 7714.08
> nuttcp-t: 0.0user 4.1sys 0:05real 83% 107i+1436d 630maxrss 0+2pf 4896+0csw
> 
> nuttcp-r: v6.1.2: socket
> nuttcp-r: buflen=65536, nstream=1, port=5001 tcp
> nuttcp-r: accept from 10.2.101.12
> nuttcp-r: send window size = 33304, receive window size = 131768
> nuttcp-r: 2420.5000 MB in 5.15 real seconds = 481679.37 KB/sec = 
> 3945.9174 Mbps
> nuttcp-r: 242266 I/O calls, msec/call = 0.02, calls/sec = 47080.98
> nuttcp-r: 0.0user 2.4sys 0:05real 49% 112i+1502d 618maxrss 0+15pf 
> 156333+0csw
> 
> About 1/4 improvement...
> 
> With LRO on both sender and receiver
> 
> # nuttcp -t -T 5 -w 128 -v 10.2.101.11
> nuttcp-t: v6.1.2: socket
> nuttcp-t: buflen=65536, nstream=1, port=5001 tcp -> 10.2.101.11
> nuttcp-t: time limit = 5.00 seconds
> nuttcp-t: connect to 10.2.101.11 with mss=1448, RTT=0.049 ms
> nuttcp-t: send window size = 131768, receive window size = 66608
> nuttcp-t: 2585.7500 MB in 5.02 real seconds = 527402.83 KB/sec = 
> 4320.4840 Mbps
> nuttcp-t: host-retrans = 1
> nuttcp-t: 41372 I/O calls, msec/call = 0.12, calls/sec = 8240.67
> nuttcp-t: 0.0user 4.6sys 0:05real 93% 106i+1421d 630maxrss 0+2pf 4286+0csw
> 
> nuttcp-r: v6.1.2: socket
> nuttcp-r: buflen=65536, nstream=1, port=5001 tcp
> nuttcp-r: accept from 10.2.101.12
> nuttcp-r: send window size = 33304, receive window size = 131768
> nuttcp-r: 2585.7500 MB in 5.15 real seconds = 514585.31 KB/sec = 
> 4215.4829 Mbps
> nuttcp-r: 282820 I/O calls, msec/call = 0.02, calls/sec = 54964.34
> nuttcp-r: 0.0user 2.7sys 0:05real 55% 114i+1540d 618maxrss 0+15pf 
> 188794+147csw
> 
> Even better...
> 
> With LRO on sender only:
> 
> # nuttcp -t -T 5 -w 128 -v 10.2.101.11
> nuttcp-t: v6.1.2: socket
> nuttcp-t: buflen=65536, nstream=1, port=5001 tcp -> 10.2.101.11
> nuttcp-t: time limit = 5.00 seconds
> nuttcp-t: connect to 10.2.101.11 with mss=1448, RTT=0.054 ms
> nuttcp-t: send window size = 131768, receive window size = 66608
> nuttcp-t: 2077.5437 MB in 5.02 real seconds = 423740.81 KB/sec = 
> 3471.2847 Mbps
> nuttcp-t: host-retrans = 0
> nuttcp-t: 33241 I/O calls, msec/call = 0.15, calls/sec = 6621.01
> nuttcp-t: 0.0user 4.5sys 0:05real 92% 109i+1468d 630maxrss 0+2pf 49532+25csw
> 
> nuttcp-r: v6.1.2: socket
> nuttcp-r: buflen=65536, nstream=1, port=5001 tcp
> nuttcp-r: accept from 10.2.101.12
> nuttcp-r: send window size = 33304, receive window size = 131768
> nuttcp-r: 2077.5437 MB in 5.15 real seconds = 413415.33 KB/sec = 
> 3386.6984 Mbps
> nuttcp-r: 531979 I/O calls, msec/call = 0.01, calls/sec = 103378.67
> nuttcp-r: 0.0user 4.5sys 0:05real 88% 110i+1474d 618maxrss 0+15pf 
> 117367+0csw
> 
> 
> >also remember that hw.ixgbe.max_interrupt_rate has only
> >effect at module load -- i.e. you set it with the bootloader,
> >or with kenv before loading the module.
> 
> I have this in /boot/loader.conf
> 
> kern.ipc.nmbclusters=512000
> hw.ixgbe.max_interrupt_rate=0
> 
> on both sender and receiver.
> 
> >Please retry the measurements disabling tso (on both sides, but
> >it really matters only on the sender). Also, LRO requires HWCSUM.
> 
> How do I set HWCSUM? Is this different from RXCSUM/TXCSUM?

by HWCSUM i mean either rxcsum or txcsum.
On ixgbe, you set one, and the other one is also set.  Same for resetting.
I don't remember what happens if you set lro, judging from the code
it might automatically set rxcsum.

As you see in your experiment, it's the sender that is starving,
and setting hw.ixgbe.max_interrupt_rate=0 is not terribly helpful
for lro.
In my case, best conditions are:

	with   hw.ixgbe.max_interrupt_rate=0:
		sender: ifconfig ix0 txcsum tso -lro
		receiver: ifconfig ix0 rxcsum lro

		txcsum and tso reduce the load on the sender,
		but lro is less effective on the receiver with
		no interrupt mitigation

	with   hw.ixgbe.max_interrupt_rate > 0
		sender: ifconfig ix0 txcsum -tso -lro <-- note the -tso
		receiver: ifconfig ix0 rxcsum lro

		It seems that txcsum and tso together trigger some
		problem in the sender so i have to give up one.
		A larger mitigation interval reduces the load on
		the receiver (you'll see fewer read call)

Of course your blade might be slower than my test mchine so
i wouldn't expect to see the same numbers, but i do expect to see
a similar improvement/reduction when playing with the various
parameters.

for raw hw speed you should check how much you get over the loopback
interface. I have about 15 Gbit/s on one, 44 Gbit/s on another (but
the latter does not have an ixgbe card in it so i cannot test more).

cheers
luigi

> Still I get nowhere near what you get on my hardware... Here is what 
> pciconf -vlbc has to say
> 
> ix0_at_pci0:3:0:0: class=0x020000 card=0xffffffff chip=0x10fc8086 rev=0x01 
> hdr=0x00
>     vendor     = 'Intel Corporation'
>     class      = network
>     subclass   = ethernet
>     bar   [10] = type Memory, range 64, base 0xfbc00000, size 2097152, 
> enabled
>     bar   [18] = type I/O Port, range 32, base 0xdc00, size 32, enabled
>     bar   [20] = type Memory, range 64, base 0xfbbfc000, size 16384, 
> enabled
>     cap 01[40] = powerspec 3  supports D0 D3  current D0
>     cap 05[50] = MSI supports 1 message, 64 bit, vector masks
>     cap 11[70] = MSI-X supports 64 messages in map 0x20 enabled
>     cap 10[a0] = PCI-Express 2 endpoint max data 256(512) link x8(x8)
>     cap 03[e0] = VPD
> ecap 0001[100] = AER 1 0 fatal 0 non-fatal 1 corrected
> ecap 0003[140] = Serial 1 002590ffff363f80
> ecap 000e[150] = unknown 1
> ecap 0010[160] = unknown 1
> ix1_at_pci0:3:0:1: class=0x020000 card=0xffffffff chip=0x10fc8086 rev=0x01 
> hdr=0x00
>     vendor     = 'Intel Corporation'
>     class      = network
>     subclass   = ethernet
>     bar   [10] = type Memory, range 64, base 0xfb800000, size 2097152, 
> enabled
>     bar   [18] = type I/O Port, range 32, base 0xd880, size 32, enabled
>     bar   [20] = type Memory, range 64, base 0xfbbf8000, size 16384, 
> enabled
>     cap 01[40] = powerspec 3  supports D0 D3  current D0
>     cap 05[50] = MSI supports 1 message, 64 bit, vector masks
>     cap 11[70] = MSI-X supports 64 messages in map 0x20 enabled
>     cap 10[a0] = PCI-Express 2 endpoint max data 256(512) link x8(x8)
>     cap 03[e0] = VPD
> ecap 0001[100] = AER 1 0 fatal 0 non-fatal 1 corrected
> ecap 0003[140] = Serial 1 002590ffff363f80
> ecap 000e[150] = unknown 1
> ecap 0010[160] = unknown 1
> 
> I am using ix1, as the blade enclosure has only one 10G switch and it 
> happens to be on the 'second' position.
> 
> Daniel
Received on Thu Dec 08 2011 - 09:34:43 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:21 UTC