Re: bizarre em + TSO + MSS issue in RELENG_7

From: Jack Vogel <jfvogel_at_gmail.com>
Date: Sun, 18 Nov 2007 15:49:07 -0800
On Nov 18, 2007 3:26 PM, Mike Andrews <mandrews_at_bit0.com> wrote:
>
> On Sun, 18 Nov 2007, Jack Vogel wrote:
>
> > On Nov 18, 2007 11:33 AM, Jack Vogel <jfvogel_at_gmail.com> wrote:
> >>
> >> On Nov 18, 2007 12:58 AM, Mike Andrews <mandrews_at_bit0.com> wrote:
> >>>
> >>> On Sat, 17 Nov 2007, Mike Andrews wrote:
> >>>
> >>>> Kip Macy wrote:
> >>>>> On Nov 17, 2007 5:28 PM, Mike Andrews <mandrews_at_bit0.com> wrote:
> >>>>>> Kip Macy wrote:
> >>>>>>> On Nov 17, 2007 3:23 PM, Mike Andrews <mandrews_at_bit0.com> wrote:
> >>>>>>>> On Sat, 17 Nov 2007, Kip Macy wrote:
> >>>>>>>>
> >>>>>>>>> On Nov 17, 2007 2:33 PM, Mike Andrews <mandrews_at_bit0.com> wrote:
> >>>>>>>>>> On Sat, 17 Nov 2007, Kip Macy wrote:
> >>>>>>>>>>
> >>>>>>>>>>> On Nov 17, 2007 10:33 AM, Denis Shaposhnikov <dsh_at_vlink.ru> wrote:
> >>>>>>>>>>>> On Sat, 17 Nov 2007 00:42:54 -0500 (EST)
> >>>>>>>>>>>> Mike Andrews <mandrews_at_bit0.com> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Has anyone run into problems with MSS not being respected when
> >>>>>>>>>>>>> using
> >>>>>>>>>>>>> TSO, specifically on em cards?
> >>>>>>>>>>>> Yes, I wrote about this problem on the beginning of 2007, see
> >>>>>>>>>>>>
> >>>>>>>>>>>>     http://tinyurl.com/3e5ak5
> >>>>>>>>>>>>
> >>>>>>>>>>> if_em.c:3502
> >>>>>>>>>>>        /*
> >>>>>>>>>>>         * Payload size per packet w/o any headers.
> >>>>>>>>>>>         * Length of all headers up to payload.
> >>>>>>>>>>>         */
> >>>>>>>>>>>        TXD->tcp_seg_setup.fields.mss =
> >>>>>>>>>>> htole16(mp->m_pkthdr.tso_segsz);
> >>>>>>>>>>>        TXD->tcp_seg_setup.fields.hdr_len = hdr_len;
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Please print out the value of tso_segsz here. It appears to be being
> >>>>>>>>>>> set correctly. The only thing I can think of is that t_maxopd is not
> >>>>>>>>>>> correct. As tso_segsz is correct here:
> >>>>>>>>>> It repeatedly prints 1368 during a 1 meg file transfer over a
> >>>>>>>>>> connection
> >>>>>>>>>> with a 1380 MSS.  Any other printf's I can add?  I'm working on a web
> >>>>>>>>>> page
> >>>>>>>>>> with tcpdump / firewall log output illustrating the issue...
> >>>>>>>>> Mike -
> >>>>>>>>> Denis' tcpdump output doesn't show oversized segments, something else
> >>>>>>>>> appears to be happening there. Can you post your tcpdump output
> >>>>>>>>> somewhere?
> >>>>>>>> URL sent off-list.
> >>>>>>>        if (tso) {
> >>>>>>>                m->m_pkthdr.csum_flags = CSUM_TSO;
> >>>>>>>                m->m_pkthdr.tso_segsz = tp->t_maxopd - optlen;
> >>>>>>>        }
> >>>>>>>
> >>>>>>>
> >>>>>>> Please print the value of maxopd and optlen under "if (tso)" in
> >>>>>>> tcp_output. I think the calculated optlen may be too small.
> >>>>>>
> >>>>>> maxopt=1380 - optlen=12 = tso_segsz=1368
> >>>>>>
> >>>>>> Weird though, after this reboot, I had to re-copy a 4 meg file 5 times
> >>>>>> to start getting the firewall to log any drops.  Transfer rate was
> >>>>>> around 240KB/sec before the firewall started to drop, then it went down
> >>>>>> to about 64KB/sec during the 5th copy, and stayed there for subsequent
> >>>>>> copies.  The actual packet size the firewall said it was dropping was
> >>>>>> varying all over the place still, yet the maxopt/optlen/tso_segsz values
> >>>>>> stayed constant.  But it's interesting that it didn't start dropping
> >>>>>> immediately after the reboot -- though the transfer rate was still
> >>>>>> sub-optimal.
> >>>>>
> >>>>> Ok, next theory :D. You shouldn't be seeing "bad len" packets from
> >>>>> tcpdump. I'm wondering if that means you're sending down more than
> >>>>> 64k. Can you please print out the value of mp->m_pkthdr.len around the
> >>>>> same place that you printed out tso_segsz? 64k is the generally
> >>>>> accepted limit for TSO, I'm wondering if the card firmware does
> >>>>> something weird if you give it more.
> >>>>
> >>>> OK.  In that last message, where I said it took 5 times to start reproducing
> >>>> the problem... this time it took until I actually toggled TSO back off and
> >>>> back on again, and then it started acting up again.  I don't know what the
> >>>> actual trigger is... it's very weird.
> >>>>
> >>>> Initially, w/ TSO on and it wasn't dropping yet (but was still transferring
> >>>> slow)...
> >>>>
> >>>> BIT0 DEBUG: tso_segsz=1368  hdr_len=66  mp->m_pkthdr.len=8306
> >>>> BIT0 DEBUG: tso_segsz=1368  hdr_len=66  mp->m_pkthdr.len=8306
> >>>> BIT0 DEBUG: tso_segsz=1368  hdr_len=66  mp->m_pkthdr.len=8306
> >>>> BIT0 DEBUG: tso_segsz=1368  hdr_len=66  mp->m_pkthdr.len=8306
> >>>> (etc, always 8306)
> >>>>
> >>>> After toggling off/on which caused the drops to start (and the speed to drop
> >>>> even further):
> >>>>
> >>>> BIT0 DEBUG: tso_segsz=1368  hdr_len=66  mp->m_pkthdr.len=7507
> >>>> BIT0 DEBUG: tso_segsz=1368  hdr_len=66  mp->m_pkthdr.len=3053
> >>>> BIT0 DEBUG: tso_segsz=1368  hdr_len=66  mp->m_pkthdr.len=1677
> >>>> BIT0 DEBUG: tso_segsz=1368  hdr_len=66  mp->m_pkthdr.len=3037
> >>>> BIT0 DEBUG: tso_segsz=1368  hdr_len=66  mp->m_pkthdr.len=2264
> >>>> BIT0 DEBUG: tso_segsz=1368  hdr_len=66  mp->m_pkthdr.len=1656
> >>>> BIT0 DEBUG: tso_segsz=1368  hdr_len=66  mp->m_pkthdr.len=1902
> >>>> BIT0 DEBUG: tso_segsz=1368  hdr_len=66  mp->m_pkthdr.len=1888
> >>>> BIT0 DEBUG: tso_segsz=1368  hdr_len=66  mp->m_pkthdr.len=1640
> >>>> BIT0 DEBUG: tso_segsz=1368  hdr_len=66  mp->m_pkthdr.len=1871
> >>>> BIT0 DEBUG: tso_segsz=1368  hdr_len=66  mp->m_pkthdr.len=2461
> >>>> BIT0 DEBUG: tso_segsz=1368  hdr_len=66  mp->m_pkthdr.len=1849
> >>>> BIT0 DEBUG: tso_segsz=1368  hdr_len=66  mp->m_pkthdr.len=2092
> >>>>
> >>>> and so on, with more seemingly random lengths... but none of them ever over
> >>>> 8306, much less 64K.
> >>>
> >>>
> >>> Got a few more data points here.
> >>>
> >>> I can reproduce this on an i386 kernel, so it isn't amd64 specific.
> >>>
> >>> I can reproduce this on an 82541EI nic, so it isn't 82573 specific.
> >>>
> >>> I can't reproduce this on a Marvell Yukon II (msk) nic; it works fine
> >>> whether TSO is on or off.
> >>>
> >>> I can't reproduce this on a bge nic because it doesn't support TSO :)
> >>> That's the only other gigabit nic I've got easy access to.
> >>>
> >>> I can reproduce this with just a Cisco 877W IOS-based router and no Cisco
> >>> PIX / ASA firewalls in the way, with the servers on the LAN interface with
> >>> "ip tcp adjust-mss 1340" on it, and the downloading client on the Cisco's
> >>> 802.11G interface.  This time, the client is a Macbook Pro running
> >>> Leopard, and I'm running "tcpdump -i en1 -s 1500 -n -v length \> 1394" on
> >>> the Macbook (not the server this time) to find oversize packets, which is
> >>> actually handier because I can see how trashed they really get :)
> >>>
> >>> I can't reproduce this between two machines on the same subnet (though I
> >>> can reproduce throughput problems alone).  I haven't tried lowering the
> >>> system MSS on one end yet (is there a sysctl to lower the MSS for outbound
> >>> connections without lowering the MTU as well?).  If I could do this it
> >>> would greatly simplify testing for everyone as they wouldn't have to stick
> >>> an MSS-clamping router in the middle.  It doesn't have to be Cisco.
> >>>
> >>> With this setup, copying to the Mac through the 877W from:
> >>>
> >>> msk-based server, TSO disabled: tcpdump reports no problems, file
> >>> transfers are fast
> >>>
> >>> msk-based server, TSO enabled: tcpdump reports no problems, file
> >>> transfers are fast
> >>>
> >>> em-based server, TSO disabled: tcpdump reports no problems, file
> >>> transfers are fast
> >>>
> >>> em-based server, TSO enabled: tcpdump reports numerous oversize packets of
> >>> varying sizes just as before, AND numerous packets with bad TCP checksums.
> >>> The checksum problems aren't limited to only the large packets though.
> >>> (That's probably what's causing the throughput problems.)  Toggling rxcsum
> >>> and txcsum flags on the server made no difference.  What I haven't tried
> >>> yet is hexdumping the packets to see what exactly is getting trashed.
> >>>
> >>> The problem still comes and goes; sometimes it'll work for a few minutes
> >>> after boot, sometimes not; it might be dependent on what other traffic's
> >>> going through the box.
> >>
> >> Hmmm, OK so the data is pointing to something in the em TSO  or encap
> >> code. I will look into this tomorrow. So the necessary elements are systems
> >> on two subnets and em doing the transmitting with TSO?
>
> And a sub-1460 MSS on the client end OR the router doing MSS clamping,
> yes.  I can't yet reproduce it with 1500 byte MTU's or between two
> machines on the same subnet.  I definitely haven't done any tests with
> jumbos...
>
> > BTW, not to dodge the problem, but this is a case where I'd say its absurd
> > to be using TSO. Is the link at 1G or 100Mb?
>
> It's reproducible at either speed, but I personally am perfectly happy
> leaving TSO disabled on my production boxes -- I've got my workaround, it
> performs, I'm cool.  At this point I'm pursuing a fix more for others'
> benefit because some other people are having at least throughput issues --
> and for my own weirdo curiosity.
>
> If a fix doesn't make 7.0-RELEASE (and I almost hate to say this) might it
> be worth disabling TSO by default in RELENG_7_0 but back on for RELENG_7?
>

Mike, do me a favor, I just noticed that my 6.6.6 driver is on the
Intel download
site now, it would be valuable for me to know if it still has the problem.
Go to downloadfinder.intel.com, then select networking, select some NIC, then
OS to FreeBSD, you should find the driver. I am doubtful its going to fix this,
but I would still like to know if you would please :)

Regards,

Jack
Received on Sun Nov 18 2007 - 22:49:19 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:22 UTC