Re: Packet corruption in re0

From: Ian FREISLICH <ianf_at_clue.co.za>
Date: Fri, 22 Feb 2008 10:43:22 +0200
Pyun YongHyeon wrote:
> On Thu, Feb 21, 2008 at 01:18:18PM +0200, Ian FREISLICH wrote:
>  > Pyun YongHyeon wrote:
>  > > On Thu, Feb 21, 2008 at 02:47:43PM +1000, Robert Backhaus wrote:
>  > >  > On Thu, Feb 21, 2008 at 1:50 PM, Pyun YongHyeon <pyunyh_at_gmail.com> wr
ote:
>  > >  > > On Thu, Feb 21, 2008 at 11:03:02AM +1000, Robert Backhaus wrote:
>  > >  > >   > I am experiencing roughly 15% packet corruption on the re inter
face 
>  > on
>  > >  > >   > my freebsd 7/amd64  box.
>  > >  > >   >
>  > >  > >   > FreeBSD gw.flexi.robbak.com 7.0-PRERELEASE FreeBSD 7.0-PRERELEA
SE #8
>  > :
>  > >  > >   > Tue Feb  5 09:49:55 EST 2008
>  > >  > >   > root_at_gw.flexi.robbak.com:/usr/obj/usr/src/sys/GW  amd64
>  > >  > >   >
>  > >  > >   > Just to make troubleshooting difficult, this problem only shows
 up
>  > >  > >   > after the system has been up for roughly 36 hours, depending on
 the
>  > >  > >   > amount of traffic.
>  > >  > >   >
>  > >  > >
>  > >  > >  I didn't take a look attached tcpdump files but I guess the
>  > >  > >  instability issue was fixed in HEAD. It's not yet MFCed but
>  > >  > >  I'll handle it in a week.
>  > >  > >
>  > >  > >  Would you try re(4) in HEAD?
>  > >  > >
>  > >  > 
>  > >  > OK, I'll do that. What is the best way to do that? csupping to "." se
ems a
>  > >  > bit drastic, and I don't do much with cvs proper. I take it that I sh
ould 
>  > use
>  > >  > anon-cvs to grab the directory, but I don't quite know how.
>  > >  > 
>  > > 
>  > > Copy sys/dev/re/if_re.c, sys/pci/if_rlreg.h in HEAD to your box.
>  > > Due to lack of m_defrag(9) in 7-PRERELEASE/RC, you also have to add
>  > > that function to if_re.c(Copy m_defrag() in sys/kern/uipc_mbuf.c on
>  > > HEAD/RELENG_7 to if_re.c). That would make it build on your box.
>  > 
>  > This doesn't solve the problem that I'm seeing on re(4) interfaces.
>  > It basically shows up as quagga establishing OSPF neighours as
>  > "Exchange/DR" when VLAN hardware tagging is enabled.  I'm running
>  > OSPF over 802.1Q vlans.  Neighbours are correctly negotiated once
>  > VLAN hardware tagging is disabled on the interface.
>  > 
>  > I'll do more debugging.
>  > 
> 
> Hmm. That sounds like different issue to me. I guess I din't change
> any semantics in VLAN H/W tagging. Do you still the same VLAN H/W
> tagging related issues on RELENG_7?
> 
> To narrow down the issue it would be even better to know which parts
> of H/W assistance was broken. For example,
>  - Disable checksum offload for VLAN interface first and check
>    whether quagga works.

You can only disable offload on the parent interface.

>  - Disable checksum offload for parent interface and check again.
> If you can post tcpdump output for broken conntection it may help a
> lot to diagnose the issue.

The only flag affecting this behaviour is vlanhwtag.  Various
permutations of the interface flags make no difference to this
behaviour as long as hardware tagging is enabled.

It seems like it's corrupting large packets on transmit when vlanhwtag
is enabled.  From the tcpdump output it looks like a padding or
packet length issue.

Here's what tcpdump on the re(4) device thinks it's transmitting:

00:08:a1:3c:32:9c > 00:90:fb:0c:89:7d, ethertype 802.1Q (0x8100), length 1510: vlan 1000, p 0, ethertype IPv4, 196.22.138.92 > 196.22.138.89: OSPFv2, Database Description, length: 1472

Here's what was actually recieved by the em(4) device on the
neighbour.  Note the absense of the 801.1Q header:

00:08:a1:3c:32:9c > 00:90:fb:0c:89:7d, ethertype IPv4 (0x0800), length 1506: 196.22.138.92 > 196.22.138.89: OSPFv2, Database Description, length: 1472

When vlanhwtagging is disabled, the re(4) device transmits:

00:90:fb:0c:89:7d > 00:08:a1:3c:32:9c, ethertype 802.1Q (0x8100), length 1510: vlan 1000, p 0, ethertype IPv4, 196.22.138.89 > 196.22.138.92: OSPFv2, Database Description, length: 1472

and the em(4) device recieves:

00:08:a1:3c:32:9c > 00:90:fb:0c:89:7d, ethertype 802.1Q (0x8100), length 1510: vlan 1000, p 0, ethertype IPv4, 196.22.138.92 > 196.22.138.89: OSPFv2, Database Description, length: 1472

Let me know if you need more detailed tcpdump output than I've provided.

Ian

--
Ian Freislich
Received on Fri Feb 22 2008 - 07:44:13 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:27 UTC