Re: CURRENT: net/igb broken

From: O. Hartmann <ohartman_at_zedat.fu-berlin.de>
Date: Mon, 5 Oct 2015 07:23:55 +0200
On Fri, 2 Oct 2015 08:52:57 -0700
Sean Bruno <sbruno_at_freebsd.org> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA512
> 
> 
> 
> On 10/02/15 00:47, O. Hartmann wrote:
> > On Thu, 01 Oct 2015 15:39:11 +0000 Eric Joyner <ricera10_at_gmail.com>
> > wrote:
> > 
> >> Oliver,
> >> 
> >> did you try Sean's suggestion?
> >> 
> >> - Eric
> >> 
> >> On Tue, Sep 22, 2015 at 1:10 PM Sean Bruno <sbruno_at_freebsd.org>
> >> wrote:
> >> 
> > 
> > 
> > On 09/21/15 23:23, O. Hartmann wrote:
> >>>>> On Mon, 21 Sep 2015 21:13:18 +0000 Eric Joyner
> >>>>> <ricera10_at_gmail.com> wrote:
> >>>>> 
> >>>>>> If you do a diff between r288057 and r287761, there are
> >>>>>> no differences between the sys/dev/e1000, sys/modules/em,
> >>>>>> and sys/modules/igb directories. Are you sure r287761
> >>>>>> actually works?
> >>>>> 
> >>>>> I'm quite sure r287761 works (and r287762 doesn't), double
> >>>>> checked this this morning again. I also checked r288093 and
> >>>>> it is still not working.
> >>>>> 
> >>>>> The ensure that I'm not the culprit and stupid here:
> >>>>> 
> >>>>> I use a NanoBSD environment and the only thing that gets
> >>>>> exchanged, is the underlying OS/OS revision. The
> >>>>> configuration always stays the same. The base system for
> >>>>> all of my tests is built from a clean source - (deleted
> >>>>> obj/ dir, clean, fresh build into obj/ for every test I
> >>>>> ran).
> >>>>> 
> >>>>> I realised a funny thing. Playing around with
> >>>>> enabling/disabling TSO (I have been told that could be the
> >>>>> culprit in an earlier Email from this list) with the
> >>>>> commend sequence:
> >>>>> 
> >>>>> ifconfig igb1 down ifconfig igb1 -tso ifconfig igb1 up
> >>>>> ifconfig igb1 down ifconfig igb1 tso ifconfig igb1 up . .
> >>>>> .
> >>>>> 
> >>>>> while a ping is pinging in the background a remote host
> >>>>> connected to that specific interface, the ping does work
> >>>>> for a while and dies then after a round trip of roughly 10
> >>>>> - 20. I can reproduce this.
> >>>>> 
> >>>>> is that observation of any help?
> >>>>> 
> >>>>> Regards,
> >>>>> 
> >>>>> oh
> >>>>> 
> >>>>>> 
> >>>>>> On Mon, Sep 21, 2015 at 1:58 AM O. Hartmann 
> >>>>>> <ohartman_at_zedat.fu-berlin.de> wrote:
> >>>>>> 
> >>>>>>> On Sat, 19 Sep 2015 11:23:44 -0700 Sean Bruno 
> >>>>>>> <sbruno_at_freebsd.org> wrote:
> >>>>>>> 
> >>>>> 
> >>>>> 
> >>>>> On 09/18/15 10:20, Eric Joyner wrote:
> >>>>>>>>>> He has an i210 -- he would want to revert 
> >>>>>>>>>> e1000_i210.[ch], too.
> >>>>>>>>>> 
> >>>>>>>>>> Sorry for the thrash Sean -- it sounds like it
> >>>>>>>>>> would be a good idea for you should revert this
> >>>>>>>>>> patch, and Jeff and I can go look at trying these
> >>>>>>>>>> shared code updates and igb changes internally
> >>>>>>>>>> again. We at Intel really could've done a better
> >>>>>>>>>> job of making sure these changes worked across a
> >>>>>>>>>> wider variety of devices.
> >>>>>>>>>> 
> >>>>>>>>>> - Eric
> >>>>> 
> >>>>> I've reverted the changes to head.  I'll reopen the reviews
> >>>>> and we can proceed from there.
> >>>>> 
> >>>>> sean
> >>>>> 
> >>>>> 
> >>>>>>>>>> 
> >>>>>>>>>> On Fri, Sep 18, 2015 at 9:50 AM Sean Bruno 
> >>>>>>>>>> <sbruno_at_freebsd.org <mailto:sbruno_at_freebsd.org>>
> >>>>>>>>>> wrote:
> >>>>>>>>>> 
> >>>>>>>>>> 
> >>>>>>>>>>> 
> >>>>>>>>>>> r287762 broke the system
> >>>>>>>>>> 
> >>>>>>>>>> 
> >>>>>>>>>> Before I revert this changeset *again* can you
> >>>>>>>>>> test revert r287762 from if_igb.c, e1000_82575.c
> >>>>>>>>>> and e1000_82575.h *only*
> >>>>>>>>>> 
> >>>>>>>>>> That narrows down the change quite a bit.
> >>>>>>>>>> 
> >>>>>>>>>> sean

[...]

> >>>>>>>
> >>>>>>>>>> 
> I'm now on r288057 on that specific machine, supposedly
> >>>>>>> reverted changes that seemingly has been identified as
> >>>>>>> the culprit. Still NO change in behaviour!
> >>>>>>> 
> >>>>>>> r287761 works with the same configuration on igb
> >>>>>>> (i210), any further does not. Not ping/connect from the
> >>>>>>> outside, no ping/connect from the inside. Tried
> >>>>>>> different protocols (SAMBA, ssh, LDAP, DNS). Affected
> >>>>>>> is/are only boxes with the igb driver and i210 chipset
> >>>>>>> (we do not have other chips covered by igb).
> >>>>>>> 
> >>>>>>> Regards, Oliver

[...]
 
> > 
> > For my entertainment (and HPS's), can you run HEAD and revert
> > r287775?
> > 
> > sean

[...]
 
> > I did as suggested:
> > 
> > checking out the most recent HEAD of CURRENT this morning, which
> > is/was for me r288474. I applied then "svn merge -c -287775 .",
> > which reverted(?) only r287775, which is something with
> > tcp_output.c or so. I did not remember.
> > 
> 
> Thanks.  This is what I intended.
> 
> 
> > I recompiled a fresh world (cleaning up /usr/obj completely by
> > deleting the folder) and try running the target system with the
> > created image.
> > 
> > Result: the same as >r287761, it doesn't work. I reverted back to
> > r287761, which works for me on the specific target hardware
> > (Fujitsu Primergy RX 1330 M1).
> > 
> 
> What's really confusing me is that I've reverted r287762 and you are
> still having problems.

It is confusing me also. I'm about to walk through the commits to check whether
there is another possibility of influence - say: changes in the way things work
due to configuration et cetera. Due to the fact I use a NanoBSD image on that
very specific system, the configuration always is the very same but the
underlying OS changes with the revision.

An observation I made is also very strange: on most recent CURRENT flapping the
state of the igb network interface by bringing it up and down repeatedly, I get
sometimes, not always and reproducable, a connection - pings go through for a
couple of pakets, but not more than 10 in the tests I ran so far.

> 
> Can you set bootverbose (boot_verbose="YES" in loader.conf) with the
> current version of -CURRENT and post the dmesg somewhere for me to
> look at?

Yes, of course, but in worst case I can do this not before Wednesday since we
have to perform some tests on that specific system today and Tuesday and I'm
now with the working revision r287761. It's a bit complicated, die to the fact
the system is isolated from the internet so far and I have to pull the dmesg
and save it to a flash drive and this I have to do on-site, and I'm not on-site
at the moment.

> 
> sean

Oliver
Received on Mon Oct 05 2015 - 03:24:00 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:00 UTC