Re: CURRENT: re(4) crashing system

From: O. Hartmann <ohartman_at_zedat.fu-berlin.de>
Date: Sat, 19 Nov 2016 19:44:35 +0100
Am Mon, 7 Nov 2016 11:16:23 +0900
YongHyeon PYUN <pyunyh_at_gmail.com> schrieb:

> On Sun, Nov 06, 2016 at 01:20:36PM +0100, Hartmann, O. wrote:
> > On Mon, 31 Oct 2016 11:12:22 +0900
> > YongHyeon PYUN <pyunyh_at_gmail.com> wrote:
> >   
> > > On Fri, Oct 28, 2016 at 09:21:13PM +0200, Hartmann, O. wrote:  
> > > > On Thu, 27 Oct 2016 10:00:04 +0900
> > > > YongHyeon PYUN <pyunyh_at_gmail.com> wrote:
> > > >     
> > > > > On Tue, Oct 25, 2016 at 07:03:38AM +0200, Hartmann, O. wrote:    
> > > > > > On Tue, 25 Oct 2016 11:05:38 +0900
> > > > > > YongHyeon PYUN <pyunyh_at_gmail.com> wrote:
> > > > > >       
> > > > > 
> > > > > [...]
> > > > >     
> > > > > > > I'm not sure but it's likely the issue is related with
> > > > > > > EEE/Green Ethernet handling. EEE is negotiated feature with
> > > > > > > link partner. If you directly connect your laptop to non-EEE
> > > > > > > capable link partner like other re(4) box without switches
> > > > > > > you may be able to tell whether the issue is EEE/Green
> > > > > > > Ethernet related one or not.      
> > > > > > 
> > > > > > Me either since when I discovered a problem the first time with
> > > > > > CURRENT, that was the Friday before last week's Friday, there
> > > > > > was a unlucky coicidence: I got the new switch, FreeBSD
> > > > > > introduced a serious bug and I changed the NICs.
> > > > > > 
> > > > > > The laptop, the last in the row of re(4) equipted systems on
> > > > > > which I use the Realtek NIC, does well now with Green IT
> > > > > > technology, but crashes on plugging/unplugging - not on each
> > > > > > event, but at least in one of ten.      
> > > > > 
> > > > > Hmm, it seems you know how to trigger the issue. When you unplug
> > > > > UTP cable was there active network traffic on re(4) device?
> > > > > It would be helpful to know which event triggers the crash(e.g.
> > > > > unplugging or plugging).  And would you show me backtrace of
> > > > > panic?   
> > > > > > I guess the Green IT issue is more a unlucky guess of mine and
> > > > > > went hand in hand with the problem I face with CURRENT right
> > > > > > now on some older, Non UEFI machines.
> > > > > >       
> > > > > 
> > > > > Ok.
> > > > > 
> > > > > [...]    
> > > > > > 
> > > > > > As requested the informations about re0 and rgephy0 on the
> > > > > > laptop (Lenovo E540) 
> > > > > > 
> > > > > > [...]
> > > > > > 
> > > > > > rgephy0: <RTL8251 1000BASE-T media interface> PHY 1 on miibus0
> > > > > > rgephy0:  none, 10baseT, 10baseT-FDX, 10baseT-FDX-flow,
> > > > > > 100baseTX, 100baseTX-FDX, 100baseTX-FDX-flow, 1000baseT-FDX,
> > > > > > 1000baseT-FDX-master, 1000baseT-FDX-flow,
> > > > > > 1000baseT-FDX-flow-master, auto, auto-flow
> > > > > > 
> > > > > > re0: <RealTek 8168/8111 B/C/CP/D/DP/E/F/G PCIe Gigabit Ethernet>
> > > > > > port 0x3000-0x30ff mem
> > > > > > 0xf0d04000-0xf0d04fff,0xf0d00000-0xf0d03fff at device 0.0 on
> > > > > > pci2 re0: Using 1 MSI-X message re0: ASPM disabled re0: Chip
> > > > > > rev. 0x50800000 re0: MAC rev. 0x00100000      
> > > > > 
> > > > > This looks like 8168GU controller.
> > > > > 
> > > > > [...]
> > > > >     
> > > > > > I use options netmap in kernel config, but the problem is also
> > > > > > present without this option - just for the record.
> > > > > >       
> > > > > 
> > > > > Yup, netmap(4) has nothing to do with the crash.
> > > > > 
> > > > > Thanks.    
> > > > 
> > > > Attached, you'll find the backtrace of the crash. This time it was
> > > > really easy - just one pull of the LAN cabling - and we are
> > > > happy :-/
> > > > 
> > > > Please let me know if you need something else. I will return to
> > > > normal operations (disabling debugging) due to CURRENT is very
> > > > unstable at the moment on other hosts beyond r307157.
> > > >     
> > > 
> > > It seems the attachment was stripped.  
> > 
> > This time I hope I got it right!
> > 
> > Attached you'll find the latest CURRENT's backtrace on the provoked
> > crash (plug and unplug).
> > 
> > I also saved the kernel and coredump, so if you need me to do further
> > investigations,please let me know.
> >   
> 
> Thanks a lot for the backtrace.  This backtrace is not the one I
> expected and I guess the issue is related with cached route removal
> on interface down.  Quick looking over the code didn't reveal the
> cause of crash(I'm not familiar with that part code).  Probably
> gnn_at_ may have better idea what's going on here(CCed).
> 
> Thanks.

In another thread I complained about permanent crashes on several "older" Intel
architectures (IvyBridge and down). It has been revealed, that

option FLOWTABLE

in the kernel, which is part of my custom kernels a long time for now, has been
identified as the culprit on those systems. Commenting out that special option solved the
problem!

Interestingly, also commenting out this option from the kernel config of the laptop in
question of this thread, I wasn't able - as of this writing - to reproduce the crashes,
so it might be that the same issue with FLOWTABLE has been triggered by pluggin and/or
unpluggin the LAN cord.

Usually I was able to trigger the coredump after two or three rounds, this time I tried
it over ten times with no effect.

But on the contrary, the NIC of the laptop doesn't negotiate for 1 GBit/s with my switch,
it remains with 100 MBit/s. The switch is a Netgear GS110TP V2.

Regards,

oh
Received on Sat Nov 19 2016 - 17:48:35 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:08 UTC