Re: quick summary results with ixgbe (was Re: datapoints on 10G throughput with TCP ?

From: Daniel Kalchev <daniel_at_digsys.bg>
Date: Thu, 08 Dec 2011 12:06:26 +0200
On 07.12.11 22:23, Luigi Rizzo wrote:
>
> Sorry, forgot to mention that the above is with TSO DISABLED
> (which is not the default). TSO seems to have a very bad
> interaction with HWCSUM and non-zero mitigation.

I have this on both sender and receiver

# ifconfig ix1
ix1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
         
options=4bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,LRO>
         ether 00:25:90:35:22:f1
         inet 10.2.101.11 netmask 0xffffff00 broadcast 10.2.101.255
         media: Ethernet autoselect (autoselect <full-duplex>)
         status: active

without LRO on either end

# nuttcp -t -T 5 -w 128 -v 10.2.101.11
nuttcp-t: v6.1.2: socket
nuttcp-t: buflen=65536, nstream=1, port=5001 tcp -> 10.2.101.11
nuttcp-t: time limit = 5.00 seconds
nuttcp-t: connect to 10.2.101.11 with mss=1448, RTT=0.051 ms
nuttcp-t: send window size = 131768, receive window size = 66608
nuttcp-t: 1802.4049 MB in 5.06 real seconds = 365077.76 KB/sec = 
2990.7170 Mbps
nuttcp-t: host-retrans = 0
nuttcp-t: 28839 I/O calls, msec/call = 0.18, calls/sec = 5704.44
nuttcp-t: 0.0user 4.5sys 0:05real 90% 108i+1459d 630maxrss 0+2pf 87706+1csw

nuttcp-r: v6.1.2: socket
nuttcp-r: buflen=65536, nstream=1, port=5001 tcp
nuttcp-r: accept from 10.2.101.12
nuttcp-r: send window size = 33304, receive window size = 131768
nuttcp-r: 1802.4049 MB in 5.18 real seconds = 356247.49 KB/sec = 
2918.3794 Mbps
nuttcp-r: 529295 I/O calls, msec/call = 0.01, calls/sec = 102163.86
nuttcp-r: 0.1user 3.7sys 0:05real 73% 116i+1567d 618maxrss 0+15pf 
230404+0csw

with LRO on receiver

# nuttcp -t -T 5 -w 128 -v 10.2.101.11
nuttcp-t: v6.1.2: socket
nuttcp-t: buflen=65536, nstream=1, port=5001 tcp -> 10.2.101.11
nuttcp-t: time limit = 5.00 seconds
nuttcp-t: connect to 10.2.101.11 with mss=1448, RTT=0.067 ms
nuttcp-t: send window size = 131768, receive window size = 66608
nuttcp-t: 2420.5000 MB in 5.02 real seconds = 493701.04 KB/sec = 
4044.3989 Mbps
nuttcp-t: host-retrans = 2
nuttcp-t: 38728 I/O calls, msec/call = 0.13, calls/sec = 7714.08
nuttcp-t: 0.0user 4.1sys 0:05real 83% 107i+1436d 630maxrss 0+2pf 4896+0csw

nuttcp-r: v6.1.2: socket
nuttcp-r: buflen=65536, nstream=1, port=5001 tcp
nuttcp-r: accept from 10.2.101.12
nuttcp-r: send window size = 33304, receive window size = 131768
nuttcp-r: 2420.5000 MB in 5.15 real seconds = 481679.37 KB/sec = 
3945.9174 Mbps
nuttcp-r: 242266 I/O calls, msec/call = 0.02, calls/sec = 47080.98
nuttcp-r: 0.0user 2.4sys 0:05real 49% 112i+1502d 618maxrss 0+15pf 
156333+0csw

About 1/4 improvement...

With LRO on both sender and receiver

# nuttcp -t -T 5 -w 128 -v 10.2.101.11
nuttcp-t: v6.1.2: socket
nuttcp-t: buflen=65536, nstream=1, port=5001 tcp -> 10.2.101.11
nuttcp-t: time limit = 5.00 seconds
nuttcp-t: connect to 10.2.101.11 with mss=1448, RTT=0.049 ms
nuttcp-t: send window size = 131768, receive window size = 66608
nuttcp-t: 2585.7500 MB in 5.02 real seconds = 527402.83 KB/sec = 
4320.4840 Mbps
nuttcp-t: host-retrans = 1
nuttcp-t: 41372 I/O calls, msec/call = 0.12, calls/sec = 8240.67
nuttcp-t: 0.0user 4.6sys 0:05real 93% 106i+1421d 630maxrss 0+2pf 4286+0csw

nuttcp-r: v6.1.2: socket
nuttcp-r: buflen=65536, nstream=1, port=5001 tcp
nuttcp-r: accept from 10.2.101.12
nuttcp-r: send window size = 33304, receive window size = 131768
nuttcp-r: 2585.7500 MB in 5.15 real seconds = 514585.31 KB/sec = 
4215.4829 Mbps
nuttcp-r: 282820 I/O calls, msec/call = 0.02, calls/sec = 54964.34
nuttcp-r: 0.0user 2.7sys 0:05real 55% 114i+1540d 618maxrss 0+15pf 
188794+147csw

Even better...

With LRO on sender only:

# nuttcp -t -T 5 -w 128 -v 10.2.101.11
nuttcp-t: v6.1.2: socket
nuttcp-t: buflen=65536, nstream=1, port=5001 tcp -> 10.2.101.11
nuttcp-t: time limit = 5.00 seconds
nuttcp-t: connect to 10.2.101.11 with mss=1448, RTT=0.054 ms
nuttcp-t: send window size = 131768, receive window size = 66608
nuttcp-t: 2077.5437 MB in 5.02 real seconds = 423740.81 KB/sec = 
3471.2847 Mbps
nuttcp-t: host-retrans = 0
nuttcp-t: 33241 I/O calls, msec/call = 0.15, calls/sec = 6621.01
nuttcp-t: 0.0user 4.5sys 0:05real 92% 109i+1468d 630maxrss 0+2pf 49532+25csw

nuttcp-r: v6.1.2: socket
nuttcp-r: buflen=65536, nstream=1, port=5001 tcp
nuttcp-r: accept from 10.2.101.12
nuttcp-r: send window size = 33304, receive window size = 131768
nuttcp-r: 2077.5437 MB in 5.15 real seconds = 413415.33 KB/sec = 
3386.6984 Mbps
nuttcp-r: 531979 I/O calls, msec/call = 0.01, calls/sec = 103378.67
nuttcp-r: 0.0user 4.5sys 0:05real 88% 110i+1474d 618maxrss 0+15pf 
117367+0csw


> also remember that hw.ixgbe.max_interrupt_rate has only
> effect at module load -- i.e. you set it with the bootloader,
> or with kenv before loading the module.

I have this in /boot/loader.conf

kern.ipc.nmbclusters=512000
hw.ixgbe.max_interrupt_rate=0

on both sender and receiver.

> Please retry the measurements disabling tso (on both sides, but
> it really matters only on the sender). Also, LRO requires HWCSUM.

How do I set HWCSUM? Is this different from RXCSUM/TXCSUM?

Still I get nowhere near what you get on my hardware... Here is what 
pciconf -vlbc has to say

ix0_at_pci0:3:0:0: class=0x020000 card=0xffffffff chip=0x10fc8086 rev=0x01 
hdr=0x00
     vendor     = 'Intel Corporation'
     class      = network
     subclass   = ethernet
     bar   [10] = type Memory, range 64, base 0xfbc00000, size 2097152, 
enabled
     bar   [18] = type I/O Port, range 32, base 0xdc00, size 32, enabled
     bar   [20] = type Memory, range 64, base 0xfbbfc000, size 16384, 
enabled
     cap 01[40] = powerspec 3  supports D0 D3  current D0
     cap 05[50] = MSI supports 1 message, 64 bit, vector masks
     cap 11[70] = MSI-X supports 64 messages in map 0x20 enabled
     cap 10[a0] = PCI-Express 2 endpoint max data 256(512) link x8(x8)
     cap 03[e0] = VPD
ecap 0001[100] = AER 1 0 fatal 0 non-fatal 1 corrected
ecap 0003[140] = Serial 1 002590ffff363f80
ecap 000e[150] = unknown 1
ecap 0010[160] = unknown 1
ix1_at_pci0:3:0:1: class=0x020000 card=0xffffffff chip=0x10fc8086 rev=0x01 
hdr=0x00
     vendor     = 'Intel Corporation'
     class      = network
     subclass   = ethernet
     bar   [10] = type Memory, range 64, base 0xfb800000, size 2097152, 
enabled
     bar   [18] = type I/O Port, range 32, base 0xd880, size 32, enabled
     bar   [20] = type Memory, range 64, base 0xfbbf8000, size 16384, 
enabled
     cap 01[40] = powerspec 3  supports D0 D3  current D0
     cap 05[50] = MSI supports 1 message, 64 bit, vector masks
     cap 11[70] = MSI-X supports 64 messages in map 0x20 enabled
     cap 10[a0] = PCI-Express 2 endpoint max data 256(512) link x8(x8)
     cap 03[e0] = VPD
ecap 0001[100] = AER 1 0 fatal 0 non-fatal 1 corrected
ecap 0003[140] = Serial 1 002590ffff363f80
ecap 000e[150] = unknown 1
ecap 0010[160] = unknown 1

I am using ix1, as the blade enclosure has only one 10G switch and it 
happens to be on the 'second' position.

Daniel
Received on Thu Dec 08 2011 - 09:06:42 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:21 UTC