Re: 8.0-RC3 network performance regression

From: Elliot Finley <efinley.lists_at_gmail.com> Date: Sat, 28 Nov 2009 12:36:37 -0700 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:58 UTC

Robert,

Here is more info that may be helpful in tracking this down.  I'm now
running 8.0-R on all boxes.  If I use the following settings on the box
that's running netserver:

kern.ipc.maxsockbuf=16777216
net.inet.tcp.recvbuf_max=16777216
net.inet.tcp.recvbuf_inc=524288
net.inet.tcp.hostcache.expire=1

and I leave the netperf box at default, then I get 932Mbps.  But if I then
add the same settings to the box that I'm running netperf from, the speed
drops down to around 420Mbps again.

What other information is needed to help track this down?

TIA
Elliot

On Thu, Nov 19, 2009 at 9:42 AM, Elliot Finley <efinley.lists_at_gmail.com>wrote:

>
>
> On Thu, Nov 19, 2009 at 2:11 AM, Robert Watson <rwatson_at_freebsd.org>wrote:
>
>>
>> On Wed, 18 Nov 2009, Elliot Finley wrote:
>>
>>  I have several boxes running 8.0-RC3 with pretty dismal network
>>> performance. I also have some 7.2 boxes with great performance. Using iperf
>>> I did some tests:
>>>
>>> server(8.0) <- client (8.0) == 420Mbps
>>> server(7.2) <- client (7.2) == 950Mbps
>>> server(7.2) <- client (8.0) == 920Mbps
>>> server(8.0) <- client (7.2) == 420Mbps
>>>
>>> so when the server is 7.2, I have good performance regardless of whether
>>> the client is 8.0 or 7.2. when the server is 8.0, I have poor performance
>>> regardless of whether the client is 8.0 or 7.2.
>>>
>>> Has anyone else noticed this?  Am I missing something simple?
>>>
>>
>> I've generally not measured regressions along these lines, but TCP
>> performance can be quite sensitive to specific driver version and hardware
>> configuration. So far, I've generally measured significant TCP scalability
>> improvements in 8, and moderate raw TCP performance improvements over real
>> interfaces.  On the other hand, I've seen decreased TCP performance on the
>> loopback due to scheduling interactions with ULE on some systems (but not
>> all -- disabling checksum generate/verify has improved loopback on other
>> systems).
>>
>> The first thing to establish is whether other similar benchmarks give the
>> same result, which might us to narrow the issue down a bit.  Could you try
>> using netperf+netserver with the TCP_STREAM test and see if that differs
>> using the otherwise identical configuration?
>>
>> Could you compare the ifconfig link configuration of 7.2 and 8.0 to make
>> sure there's not a problem with the driver negotiating, for example, half
>> duplex instead of full duplex?  Also confirm that the same blend ot
>> LRO/TSO/checksum offloading/etc is present.
>>
>> Could you do "procstat -at | grep ifname" (where ifname is your interface
>> name) and send that to me?
>>
>> Another thing to keep an eye of is interrupt rates and pin sharing, which
>> are both sensitive to driver change and ACPI changes.  It wouldn't hurt to
>> compare vmstat -i rates not just on your network interface, but also on
>> other devices, to make sure there's not new aliasing.  With a new USB stack
>> and plenty of other changes, additional driver code running when your NIC
>> interrupt fires would be highly measurable.
>>
>> Finally, two TCP tweaks to try:
>>
>> (1) Try disabling in-flight bandwidth estimation by setting
>>    net.inet.tcp.inflight.enable to 0.  This often hurts low-latency,
>>    high-bandwidth local ethernet links, and is sensitive to many other
>> issues
>>    including time-keeping.  It may not be the "cause", but it's a useful
>>    thing to try.
>>
>> (2) Try setting net.inet.tcp.read_locking to 0, which disables the
>> read-write
>>    locking strategy on global TCP locks.  This setting, when enabled,
>>    significantly impoves TCP scalability when dealing with multiple NICs
>> or
>>    input queues, but is one of the non-trivial functional changes in TCP.
>
>
> Thanks for the reply.  Here is some more info:
>
> netperf results:
> storage-price-3 root:~#>netperf -H 10.20.10.20
> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.20.10.20
> (10.20.10.20) port 0 AF_INET
> Recv   Send    Send
> Socket Socket  Message  Elapsed
> Size   Size    Size     Time     Throughput
> bytes  bytes   bytes    secs.    10^6bits/sec
>
> 4194304 4194304 4194304    10.04     460.10
>
>
> The interface on both boxes is em1.  Both boxes (8.0RC3) have two 4-port
> PCIe NICs in them.Trying the two TCP tweaks didn't change anything.  While
> running iperf I did the procstat and vmstat:
>
> SERVER:
> storage-price-2 root:~#>ifconfig em1
> em1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
>         options=19b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4>
>         ether 00:15:17:b2:31:3d
>         inet 10.20.10.20 netmask 0xffffff00 broadcast 10.20.10.255
>         media: Ethernet autoselect (1000baseT <full-duplex>)
>         status: active
>
> storage-price-2 root:~#>procstat -at | grep em1
>     0 100040 kernel           em1 taskq          3   16 run     -
>
> storage-price-2 root:~#>vmstat -i
> interrupt                          total       rate
> irq14: ata0                        22979          0
> irq15: ata1                        23157          0
> irq16: aac0 uhci0*                  1552          0
> irq17: uhci2+                         37          0
> irq18: ehci0 uhci+                    43          0
> cpu0: timer                    108455076       2000
> irq257: em1                      2039287         37
> cpu2: timer                    108446955       1999
> cpu1: timer                    108447018       1999
> cpu3: timer                    108447039       1999
> cpu7: timer                    108447061       1999
> cpu5: timer                    108447061       1999
> cpu6: timer                    108447054       1999
> cpu4: timer                    108447061       1999
> Total                          869671380      16037
>
> CLIENT:
> storage-price-3 root:~#>ifconfig em1
> em1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
>         options=19b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4>
>         ether 00:15:17:b2:31:49
>         inet 10.20.10.30 netmask 0xffffff00 broadcast 10.20.10.255
>         media: Ethernet autoselect (1000baseT <full-duplex>)
>         status: active
>
> storage-price-3 root:~#>procstat -at | grep em1
>     0 100040 kernel           em1 taskq          3   16 run     -
>
> storage-price-3 root:~#>vmstat -i
> interrupt                          total       rate
> irq1: atkbd0                           2          0
> irq14: ata0                        22501          0
> irq15: ata1                        22395          0
> irq16: aac0 uhci0*                  5091          0
> irq17: uhci2+                        125          0
> irq18: ehci0 uhci+                    43          0
> cpu0: timer                    108421132       1999
> irq257: em1                      1100465         20
> cpu3: timer                    108412973       1999
> cpu1: timer                    108412987       1999
> cpu2: timer                    108413010       1999
> cpu7: timer                    108413048       1999
> cpu6: timer                    108413048       1999
> cpu5: timer                    108413031       1999
> cpu4: timer                    108413045       1999
> Total                          868462896      16020
>
> 7.2 BOX:
> dns1 root:~#>ifconfig em0
> em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
>         options=9b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM>
>         ether 00:13:72:5a:ff:48
>         inet X.Y.Z.7 netmask 0xffffffc0 broadcast X.Y.Z.63
>         media: Ethernet autoselect (1000baseTX <full-duplex>)
>         status: active
>
> The 8.0RC3 boxes are being used for testing right now (production 2nd week
> of December).  If you want access to them, that wouldn't be a problem.
>
> TIA
> Elliot
>
>