Re: bizarre em + TSO + MSS issue in RELENG_7

From: Jack Vogel <jfvogel_at_gmail.com> Date: Sun, 18 Nov 2007 11:40:03 -0800 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:22 UTC

On Nov 18, 2007 11:33 AM, Jack Vogel <jfvogel_at_gmail.com> wrote:
>
> On Nov 18, 2007 12:58 AM, Mike Andrews <mandrews_at_bit0.com> wrote:
> >
> > On Sat, 17 Nov 2007, Mike Andrews wrote:
> >
> > > Kip Macy wrote:
> > >> On Nov 17, 2007 5:28 PM, Mike Andrews <mandrews_at_bit0.com> wrote:
> > >>> Kip Macy wrote:
> > >>>> On Nov 17, 2007 3:23 PM, Mike Andrews <mandrews_at_bit0.com> wrote:
> > >>>>> On Sat, 17 Nov 2007, Kip Macy wrote:
> > >>>>>
> > >>>>>> On Nov 17, 2007 2:33 PM, Mike Andrews <mandrews_at_bit0.com> wrote:
> > >>>>>>> On Sat, 17 Nov 2007, Kip Macy wrote:
> > >>>>>>>
> > >>>>>>>> On Nov 17, 2007 10:33 AM, Denis Shaposhnikov <dsh_at_vlink.ru> wrote:
> > >>>>>>>>> On Sat, 17 Nov 2007 00:42:54 -0500 (EST)
> > >>>>>>>>> Mike Andrews <mandrews_at_bit0.com> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> Has anyone run into problems with MSS not being respected when
> > >>>>>>>>>> using
> > >>>>>>>>>> TSO, specifically on em cards?
> > >>>>>>>>> Yes, I wrote about this problem on the beginning of 2007, see
> > >>>>>>>>>
> > >>>>>>>>>     http://tinyurl.com/3e5ak5
> > >>>>>>>>>
> > >>>>>>>> if_em.c:3502
> > >>>>>>>>        /*
> > >>>>>>>>         * Payload size per packet w/o any headers.
> > >>>>>>>>         * Length of all headers up to payload.
> > >>>>>>>>         */
> > >>>>>>>>        TXD->tcp_seg_setup.fields.mss =
> > >>>>>>>> htole16(mp->m_pkthdr.tso_segsz);
> > >>>>>>>>        TXD->tcp_seg_setup.fields.hdr_len = hdr_len;
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> Please print out the value of tso_segsz here. It appears to be being
> > >>>>>>>> set correctly. The only thing I can think of is that t_maxopd is not
> > >>>>>>>> correct. As tso_segsz is correct here:
> > >>>>>>> It repeatedly prints 1368 during a 1 meg file transfer over a
> > >>>>>>> connection
> > >>>>>>> with a 1380 MSS.  Any other printf's I can add?  I'm working on a web
> > >>>>>>> page
> > >>>>>>> with tcpdump / firewall log output illustrating the issue...
> > >>>>>> Mike -
> > >>>>>> Denis' tcpdump output doesn't show oversized segments, something else
> > >>>>>> appears to be happening there. Can you post your tcpdump output
> > >>>>>> somewhere?
> > >>>>> URL sent off-list.
> > >>>>        if (tso) {
> > >>>>                m->m_pkthdr.csum_flags = CSUM_TSO;
> > >>>>                m->m_pkthdr.tso_segsz = tp->t_maxopd - optlen;
> > >>>>        }
> > >>>>
> > >>>>
> > >>>> Please print the value of maxopd and optlen under "if (tso)" in
> > >>>> tcp_output. I think the calculated optlen may be too small.
> > >>>
> > >>> maxopt=1380 - optlen=12 = tso_segsz=1368
> > >>>
> > >>> Weird though, after this reboot, I had to re-copy a 4 meg file 5 times
> > >>> to start getting the firewall to log any drops.  Transfer rate was
> > >>> around 240KB/sec before the firewall started to drop, then it went down
> > >>> to about 64KB/sec during the 5th copy, and stayed there for subsequent
> > >>> copies.  The actual packet size the firewall said it was dropping was
> > >>> varying all over the place still, yet the maxopt/optlen/tso_segsz values
> > >>> stayed constant.  But it's interesting that it didn't start dropping
> > >>> immediately after the reboot -- though the transfer rate was still
> > >>> sub-optimal.
> > >>
> > >> Ok, next theory :D. You shouldn't be seeing "bad len" packets from
> > >> tcpdump. I'm wondering if that means you're sending down more than
> > >> 64k. Can you please print out the value of mp->m_pkthdr.len around the
> > >> same place that you printed out tso_segsz? 64k is the generally
> > >> accepted limit for TSO, I'm wondering if the card firmware does
> > >> something weird if you give it more.
> > >
> > > OK.  In that last message, where I said it took 5 times to start reproducing
> > > the problem... this time it took until I actually toggled TSO back off and
> > > back on again, and then it started acting up again.  I don't know what the
> > > actual trigger is... it's very weird.
> > >
> > > Initially, w/ TSO on and it wasn't dropping yet (but was still transferring
> > > slow)...
> > >
> > > BIT0 DEBUG: tso_segsz=1368  hdr_len=66  mp->m_pkthdr.len=8306
> > > BIT0 DEBUG: tso_segsz=1368  hdr_len=66  mp->m_pkthdr.len=8306
> > > BIT0 DEBUG: tso_segsz=1368  hdr_len=66  mp->m_pkthdr.len=8306
> > > BIT0 DEBUG: tso_segsz=1368  hdr_len=66  mp->m_pkthdr.len=8306
> > > (etc, always 8306)
> > >
> > > After toggling off/on which caused the drops to start (and the speed to drop
> > > even further):
> > >
> > > BIT0 DEBUG: tso_segsz=1368  hdr_len=66  mp->m_pkthdr.len=7507
> > > BIT0 DEBUG: tso_segsz=1368  hdr_len=66  mp->m_pkthdr.len=3053
> > > BIT0 DEBUG: tso_segsz=1368  hdr_len=66  mp->m_pkthdr.len=1677
> > > BIT0 DEBUG: tso_segsz=1368  hdr_len=66  mp->m_pkthdr.len=3037
> > > BIT0 DEBUG: tso_segsz=1368  hdr_len=66  mp->m_pkthdr.len=2264
> > > BIT0 DEBUG: tso_segsz=1368  hdr_len=66  mp->m_pkthdr.len=1656
> > > BIT0 DEBUG: tso_segsz=1368  hdr_len=66  mp->m_pkthdr.len=1902
> > > BIT0 DEBUG: tso_segsz=1368  hdr_len=66  mp->m_pkthdr.len=1888
> > > BIT0 DEBUG: tso_segsz=1368  hdr_len=66  mp->m_pkthdr.len=1640
> > > BIT0 DEBUG: tso_segsz=1368  hdr_len=66  mp->m_pkthdr.len=1871
> > > BIT0 DEBUG: tso_segsz=1368  hdr_len=66  mp->m_pkthdr.len=2461
> > > BIT0 DEBUG: tso_segsz=1368  hdr_len=66  mp->m_pkthdr.len=1849
> > > BIT0 DEBUG: tso_segsz=1368  hdr_len=66  mp->m_pkthdr.len=2092
> > >
> > > and so on, with more seemingly random lengths... but none of them ever over
> > > 8306, much less 64K.
> >
> >
> > Got a few more data points here.
> >
> > I can reproduce this on an i386 kernel, so it isn't amd64 specific.
> >
> > I can reproduce this on an 82541EI nic, so it isn't 82573 specific.
> >
> > I can't reproduce this on a Marvell Yukon II (msk) nic; it works fine
> > whether TSO is on or off.
> >
> > I can't reproduce this on a bge nic because it doesn't support TSO :)
> > That's the only other gigabit nic I've got easy access to.
> >
> > I can reproduce this with just a Cisco 877W IOS-based router and no Cisco
> > PIX / ASA firewalls in the way, with the servers on the LAN interface with
> > "ip tcp adjust-mss 1340" on it, and the downloading client on the Cisco's
> > 802.11G interface.  This time, the client is a Macbook Pro running
> > Leopard, and I'm running "tcpdump -i en1 -s 1500 -n -v length \> 1394" on
> > the Macbook (not the server this time) to find oversize packets, which is
> > actually handier because I can see how trashed they really get :)
> >
> > I can't reproduce this between two machines on the same subnet (though I
> > can reproduce throughput problems alone).  I haven't tried lowering the
> > system MSS on one end yet (is there a sysctl to lower the MSS for outbound
> > connections without lowering the MTU as well?).  If I could do this it
> > would greatly simplify testing for everyone as they wouldn't have to stick
> > an MSS-clamping router in the middle.  It doesn't have to be Cisco.
> >
> > With this setup, copying to the Mac through the 877W from:
> >
> > msk-based server, TSO disabled: tcpdump reports no problems, file
> > transfers are fast
> >
> > msk-based server, TSO enabled: tcpdump reports no problems, file
> > transfers are fast
> >
> > em-based server, TSO disabled: tcpdump reports no problems, file
> > transfers are fast
> >
> > em-based server, TSO enabled: tcpdump reports numerous oversize packets of
> > varying sizes just as before, AND numerous packets with bad TCP checksums.
> > The checksum problems aren't limited to only the large packets though.
> > (That's probably what's causing the throughput problems.)  Toggling rxcsum
> > and txcsum flags on the server made no difference.  What I haven't tried
> > yet is hexdumping the packets to see what exactly is getting trashed.
> >
> > The problem still comes and goes; sometimes it'll work for a few minutes
> > after boot, sometimes not; it might be dependent on what other traffic's
> > going through the box.
>
> Hmmm, OK so the data is pointing to something in the em TSO  or encap
> code. I will look into this tomorrow. So the necessary elements are systems
> on two subnets and em doing the transmitting with TSO?

BTW, not to dodge the problem, but this is a case where I'd say its absurd
to be using TSO. Is the link at 1G or 100Mb?

Nevertheless it does point to a real bug in the code.

Jack