Re: ale(4): Problems with tso, rxcsum and/or txcsum

From: Pyun YongHyeon <pyunyh_at_gmail.com>
Date: Tue, 16 Jun 2009 19:17:40 +0900
On Tue, Jun 16, 2009 at 11:33:34AM +0200, Ulrich Sp??rlein wrote:
> On Mon, 15.06.2009 at 21:51:54 +0900, Pyun YongHyeon wrote:
> > On Mon, Jun 15, 2009 at 02:16:23PM +0200, Ulrich Sp??rlein wrote:
> > > Hello Pyun,
> > > 
> > > I have connection problems with the onboard GigE of an Asus P5Q board, using a recent 8-CURRENT
> > > 
> > > ale0: <Atheros AR8121/AR8113/AR8114 PCIe Ethernet> port 0xdc00-0xdc7f mem 0xfe9c0000-0xfe9fffff irq 17 at device 0.0 on pci2
> > > ale0: 960 Tx FIFO, 1024 Rx FIFO
> > > ale0: Using 1 MSI messages.
> > > ale0: 4GB boundary crossed, switching to 32bit DMA addressing mode.
> > > miibus0: <MII bus> on ale0
> > > ale0: Ethernet address: 00:24:8c:36:3e:10
> > > ale0: [FILTER]
> > > ale0: link state changed to UP
> > > 
> > > ale0_at_pci0:2:0:0:        class=0x020000 card=0x82261043 chip=0x10261969 rev=0xb0 hdr=0x00
> > >     vendor     = 'Attansic (Now owned by Atheros)'
> > >     device     = 'PCI-E ETHERNET CONTROLLER  (AR8121/AR8113 )'
> > >     class      = network
> > >     subclass   = ethernet
> > > 
> > > ale0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
> > >         options=311b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,TSO4,WOL_MCAST,WOL_MAGIC>
> > >         ether 00:24:8c:36:3e:10
> > >         inet 192.168.0.146 netmask 0xffffff00 broadcast 192.168.0.255
> > >         media: Ethernet autoselect (100baseTX <full-duplex>)
> > >         status: active
> > > 
> > > 
> > > When transferring data to the machine at ~10MB/s (100Mbit network only) the ssh
> > > connection will die after a couple of minutes with
> > > 
> > > Disconnecting: Bad packet length 1592360521.
> > > 
> > > After disabling tso, txcsum and rxcsum the connection seems to be
> > > stable, though. I fail to figure out a pattern, though. Do I need to
> > 
> > Hmm, I think this is the second report that could be related with
> > Rx checksum offloading. If disabling Rx checksum fix the issue, I
> > have to disable it by default until I understand what's going on.
> 
> I really need to disable tso, rxcsum *and* txcsum to make this card work
> stable. :/
> 

Hmm, let's see which offload was broken. Disabling all offloads
make it hard to find broken one.

> There is one other weirdness, though, regarding tso. I have been using a
> netcat-blast test, where I "upload" /dev/zero to another machine, and
> "download" it from the same machine.
> 
> When tso is enabled, upload is seriously impacted, download is fine
> though, observe systat output:
> 
>            ale0  in     10.805 MB/s         11.101 MB/s            7.739 GB
>                  out     2.574 MB/s          8.740 MB/s            5.891 GB
> 
> When disabling tso, while that test is running, it will immediately become this:
> 
>            ale0  in      7.498 MB/s         11.101 MB/s            8.270 GB
>                  out     7.560 MB/s          8.740 MB/s            6.209 GB  
> 
> Which looks more normal. Re-activating tso now has no further consequences to
> the stream (it only works for new TCP sessions, right?)
> 

I was able to saturate gigabit link with AR8121. Tx performance is
about 930Mbps or higher. Since ale(4) does not support ethernet
flow-control it could be caused by dropped frames. Check hardware
MAC counters, you can get it via "sysctl dev.ale.0.stats".
Receiver also should be fast enough to get frames without loss.

The above does not explain OpenSSH's output of "Bad packet length".
It really means incoming packets were corrupted. So I'd like to
know if you disable Rx checksum offloading you can still see the
horrible message from OpenSSH. It does not necessarily mean TSO/
Tx checksum offloading works without problems but I'd like to
narrow down issues instead of blindly disabling all offloads.
Received on Tue Jun 16 2009 - 08:14:03 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:50 UTC