Re: CURRENT: re(4) crashing system

From: YongHyeon PYUN <pyunyh_at_gmail.com>
Date: Tue, 25 Oct 2016 11:05:38 +0900
On Mon, Oct 24, 2016 at 02:03:37PM +0200, O. Hartmann wrote:
> On Mon, 24 Oct 2016 14:14:00 +0900
> YongHyeon PYUN <pyunyh_at_gmail.com> wrote:
> 
> > On Sun, Oct 23, 2016 at 01:25:38PM +0200, Hartmann, O. wrote:
> > > I tried to report earlier here that CURRENT does have some serious
> > > problems right now and one of those problems seems to be triggered by
> > > the recent re(4) driver. The problem is also present in recen 11-STABLE!
> > > 
> > > Below, you'll find pciconf-output reagrding the device on a Lenovo E540
> > > Laptop I can test on and trigger the problem.
> > > 
> > > The phenomenon is that this NIC does not negotiate 1000baseTX, it is
> > > always falling back to 100baseTX although the device claims to be a 1
> > > GBit capable device.
> > > 
> > > When I try to put the device manually into 1000basTX mode via
> > > 
> > > ifconfig re0 media 1000baseTX mediaopt full-duplex (with re(4) driver)
> > > 
> > > it is possible to crash the system. The system also crashes when
> > > plugging/unplugging the LAN cord - I guess the renegotiation is
> > > triggering this crash immediately.
> > > 
> > > I tried with several switches and routers capable of 1 GBit and it
> > > seems to be independent from the network hardware in use.
> > > 
> > > I tried to capture a backtrace when the kernel crashes, but I do not
> > > know how to save the the kernel debugger output. Although I configured
> > > according the handbook debugging, there is no coredump at all.
> > > 
> > > Advice is appreciated - if anybody is interesetd in solving this. 
> > >   
> > 
> > There were several instability reports on re(4).  I vaguely guess
> > it would be related with some missing initializations for certain
> > controllers.  Unfortunately, there is no publicly available
> > datasheet for those controllers and it's not likely to get access
> > to it in near future.  It seems vendor's FreeBSD driver accesses
> > lots of magic registers as well as loading DSP fixups.  I have no
> > idea what it wants to do and re(4) used to heavily rely on power-on
> > default register values.  Engineering samples I have do not show
> > instabilities so it wouldn't be easy to identify the issue.
> > 
> > Probably the first step to address the issue would be identifying
> > those chips and narrowing down the scope of guessing.  Would you
> > show me the dmesg output(re(4) and regphy(4) only)?  pciconf(8)
> > output is useless here since RealTek uses the same PCI id for
> > PCIe variants.
> > 
> > BTW, I was told that the vendor's FreeBSD driver seems to work fine
> > for normal usage pattern.  The vendor's driver triggered an instant
> > panic and lacked H/W offloading features in the past.  It might
> > have changed though.
> 
> The problemacy with re(4) drivers arose again, when I bought some "green"
> equipment, mainly switches, which reduces power emission on short cables or
> non-connected ports. This brought down some servers with re(4) chipsets
> immediately and I had no clue what happend. I do not know whether this is a

I'm not sure but it's likely the issue is related with EEE/Green
Ethernet handling. EEE is negotiated feature with link partner. If
you directly connect your laptop to non-EEE capable link partner
like other re(4) box without switches you may be able to tell
whether the issue is EEE/Green Ethernet related one or not.

> single fate so to speak, or this problem will arise for others, too. We
> exchanged on serving hardware all Realtek NICs with those from Intel, and
> luckily some server mainboards already have Intel PHY or NICs. The Broadcom
> devices we have on some older Fujitus hardware is also stable like a charme,
> even with the new power saving switches.
> 

bge(4) also lacks EEE support(Publicly available datasheet is too
sanitized one).  bge(4) firmware probably does not announce EEE
capability by default in link establishment while recent re(4)
devices seem to unconditionally announce EEE.  Generally EEE
handling requires a kind of handshake for link state change from
MAC/PHY.

> While we can swap on server or workstation platforms the NIC, it is almost
> impossible on laptops and the number of laptops with realtek chips seems to
> grow. It is a pity that the venodr of the chipsets reject supporting other OSes
> than Windows - or in some rare cases only Linux. After you wrote the answer, I
> checked on the net who's suiatble drivers and the situation seems bad for
> almost all OSes apart from commercial ones like Windooze and Apple OS X.
> 
> As soon as I get hands on the laptop again, I'll send the requested
> informations. I know that I played around with re(4) and rgephy(4) in the
> kernel, the rgephy(4) showed up on the dmesg, but I didn't see any effect -
> except that it offered some additional "media xxx-options-xxx" mostly appended
> with "flow" - but rying brought also down the system as pluggin or unplugging.

rgephy(4) will show recognized PHY H/W model. Another information
I'd like to know is OUI information of the PHY.  The OUI
information could be get with `devinfo -rv | grep rgephy`.

The "flow" output of media indicates it negotiated ethernet
flow-control with link partner.  rgephy(4) used to announce
autonegotiation even when manual setting is requested with
ifconfig.  It was to workaround HW issues seen in the past.  
You can disable the use of autonegotiation in manual media
selection with flag0 option. See rgephy(4) for more information.
Not sure whether that option helps though.

> The last kernel I compiled was then without rgephy(4) - the NIC worked as
> expected, but pluggin/unplugging or having some power-down activities on a
> Netgear SoHo green-pwer switch brings the system down as usual. 

If you use re(4) without rgephy(4) it will use ukphy(4) which is
completely dumb on link state detection of re(4) controller. Link
state detection requires non-PHY register access on re(4) so using
ukphy(4) is not recommended.
Received on Tue Oct 25 2016 - 00:05:45 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:08 UTC