Re: CURRENT: re(4) crashing system

From: O. Hartmann <ohartman_at_zedat.fu-berlin.de>
Date: Mon, 24 Oct 2016 14:03:37 +0200
On Mon, 24 Oct 2016 14:14:00 +0900
YongHyeon PYUN <pyunyh_at_gmail.com> wrote:

> On Sun, Oct 23, 2016 at 01:25:38PM +0200, Hartmann, O. wrote:
> > I tried to report earlier here that CURRENT does have some serious
> > problems right now and one of those problems seems to be triggered by
> > the recent re(4) driver. The problem is also present in recen 11-STABLE!
> > 
> > Below, you'll find pciconf-output reagrding the device on a Lenovo E540
> > Laptop I can test on and trigger the problem.
> > 
> > The phenomenon is that this NIC does not negotiate 1000baseTX, it is
> > always falling back to 100baseTX although the device claims to be a 1
> > GBit capable device.
> > 
> > When I try to put the device manually into 1000basTX mode via
> > 
> > ifconfig re0 media 1000baseTX mediaopt full-duplex (with re(4) driver)
> > 
> > it is possible to crash the system. The system also crashes when
> > plugging/unplugging the LAN cord - I guess the renegotiation is
> > triggering this crash immediately.
> > 
> > I tried with several switches and routers capable of 1 GBit and it
> > seems to be independent from the network hardware in use.
> > 
> > I tried to capture a backtrace when the kernel crashes, but I do not
> > know how to save the the kernel debugger output. Although I configured
> > according the handbook debugging, there is no coredump at all.
> > 
> > Advice is appreciated - if anybody is interesetd in solving this. 
> >   
> 
> There were several instability reports on re(4).  I vaguely guess
> it would be related with some missing initializations for certain
> controllers.  Unfortunately, there is no publicly available
> datasheet for those controllers and it's not likely to get access
> to it in near future.  It seems vendor's FreeBSD driver accesses
> lots of magic registers as well as loading DSP fixups.  I have no
> idea what it wants to do and re(4) used to heavily rely on power-on
> default register values.  Engineering samples I have do not show
> instabilities so it wouldn't be easy to identify the issue.
> 
> Probably the first step to address the issue would be identifying
> those chips and narrowing down the scope of guessing.  Would you
> show me the dmesg output(re(4) and regphy(4) only)?  pciconf(8)
> output is useless here since RealTek uses the same PCI id for
> PCIe variants.
> 
> BTW, I was told that the vendor's FreeBSD driver seems to work fine
> for normal usage pattern.  The vendor's driver triggered an instant
> panic and lacked H/W offloading features in the past.  It might
> have changed though.

The problemacy with re(4) drivers arose again, when I bought some "green"
equipment, mainly switches, which reduces power emission on short cables or
non-connected ports. This brought down some servers with re(4) chipsets
immediately and I had no clue what happend. I do not know whether this is a
single fate so to speak, or this problem will arise for others, too. We
exchanged on serving hardware all Realtek NICs with those from Intel, and
luckily some server mainboards already have Intel PHY or NICs. The Broadcom
devices we have on some older Fujitus hardware is also stable like a charme,
even with the new power saving switches.

While we can swap on server or workstation platforms the NIC, it is almost
impossible on laptops and the number of laptops with realtek chips seems to
grow. It is a pity that the venodr of the chipsets reject supporting other OSes
than Windows - or in some rare cases only Linux. After you wrote the answer, I
checked on the net who's suiatble drivers and the situation seems bad for
almost all OSes apart from commercial ones like Windooze and Apple OS X.

As soon as I get hands on the laptop again, I'll send the requested
informations. I know that I played around with re(4) and rgephy(4) in the
kernel, the rgephy(4) showed up on the dmesg, but I didn't see any effect -
except that it offered some additional "media xxx-options-xxx" mostly appended
with "flow" - but rying brought also down the system as pluggin or unplugging.
The last kernel I compiled was then without rgephy(4) - the NIC worked as
expected, but pluggin/unplugging or having some power-down activities on a
Netgear SoHo green-pwer switch brings the system down as usual. 
Received on Mon Oct 24 2016 - 10:03:51 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:08 UTC