Re: ale(4): Problems with tso, rxcsum and/or txcsum

From: Pyun YongHyeon <pyunyh_at_gmail.com> Date: Mon, 29 Jun 2009 14:23:30 +0900 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:50 UTC

On Sat, Jun 27, 2009 at 07:11:11PM +0200, Ulrich Sp??rlein wrote:
> Sorry for the long delay, I only now got around testing this more
> thoroughly.
> 
> On Tue, 16.06.2009 at 19:17:40 +0900, Pyun YongHyeon wrote:
> > On Tue, Jun 16, 2009 at 11:33:34AM +0200, Ulrich Sp??rlein wrote:
> > > On Mon, 15.06.2009 at 21:51:54 +0900, Pyun YongHyeon wrote:
> > > > On Mon, Jun 15, 2009 at 02:16:23PM +0200, Ulrich Sp??rlein wrote:
> > > > > Hello Pyun,
> > > > > 
> > > > > I have connection problems with the onboard GigE of an Asus P5Q board, using a recent 8-CURRENT
> > > > > 
> > > > > ale0: <Atheros AR8121/AR8113/AR8114 PCIe Ethernet> port 0xdc00-0xdc7f mem 0xfe9c0000-0xfe9fffff irq 17 at device 0.0 on pci2
> > > > > ale0: 960 Tx FIFO, 1024 Rx FIFO
> > > > > ale0: Using 1 MSI messages.
> > > > > ale0: 4GB boundary crossed, switching to 32bit DMA addressing mode.
> > > > > miibus0: <MII bus> on ale0
> > > > > ale0: Ethernet address: 00:24:8c:36:3e:10
> > > > > ale0: [FILTER]
> > > > > ale0: link state changed to UP
> > > > > 
> > > > > ale0_at_pci0:2:0:0:        class=0x020000 card=0x82261043 chip=0x10261969 rev=0xb0 hdr=0x00
> > > > >     vendor     = 'Attansic (Now owned by Atheros)'
> > > > >     device     = 'PCI-E ETHERNET CONTROLLER  (AR8121/AR8113 )'
> > > > >     class      = network
> > > > >     subclass   = ethernet
> > > > > 
> > > > > ale0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
> > > > >         options=311b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,TSO4,WOL_MCAST,WOL_MAGIC>
> > > > >         ether 00:24:8c:36:3e:10
> > > > >         inet 192.168.0.146 netmask 0xffffff00 broadcast 192.168.0.255
> > > > >         media: Ethernet autoselect (100baseTX <full-duplex>)
> > > > >         status: active
> > > > > 
> > > > > When transferring data to the machine at ~10MB/s (100Mbit network only) the ssh
> > > > > connection will die after a couple of minutes with
> > > > > 
> > > > > Disconnecting: Bad packet length 1592360521.
> > > > > 
> > > > > After disabling tso, txcsum and rxcsum the connection seems to be
> > > > > stable, though. I fail to figure out a pattern, though. Do I need to
> > > > 
> > > > Hmm, I think this is the second report that could be related with
> > > > Rx checksum offloading. If disabling Rx checksum fix the issue, I
> > > > have to disable it by default until I understand what's going on.
> > > 
> > > I really need to disable tso, rxcsum *and* txcsum to make this card work
> > > stable. :/
> > 
> > Hmm, let's see which offload was broken. Disabling all offloads
> > make it hard to find broken one.
> 
> Ok, disabling -rxcsum will make the connection stable. But when I enable
> rxcsum again, it is also stable! It looks like it is not turned on
> again. To sum it up:
> 
> 1. doing nothing: ssh connection drops after a couple of minutes
> 2. ifconfig ale0 -rxcsum: ssh runs stable for dozens of minutes
> 3. ifconfig ale0 rxcsum: ssh runs stable for dozens of minutes (wtf?)
> 
> > > There is one other weirdness, though, regarding tso. I have been using a
> > > netcat-blast test, where I "upload" /dev/zero to another machine, and
> > > "download" it from the same machine.
> 
> Scrap all my previous findings regarding this issue. I re-ran the test
> with three machines. So ale0 would download from machine A and upload to
> machine B. No matter how I hard I try, I can always saturate the 100MBit
> Ethernet in full duplex. Don't know how the previous numbers came about.
> 

Yeah, I still can't reproduce the issue you've mentioned but I
think it's better to disable Rx checksum offload at this time. If
I manage to find root cause of issue I would enable it again with
proper workarounds.

> Thanks for your patience, but it looks like the rxcsum is indeed fishy
> on this chip revision.
> 

Committed to HEAD(r195153). You can still enable Rx checksum
offload with ifconfig(8) but it is disabled by default.

Thanks for reporting!