Re: amd64/115126: [nfe] nfe0: watchdog timeout (missed Tx interrupts) -- recovering (UP with SCHED_ULE)

From: Luigi Rizzo <rizzo_at_iet.unipi.it>
Date: Wed, 23 Apr 2008 11:11:27 +0200
On Wed, Apr 23, 2008 at 05:22:40PM +0900, Pyun YongHyeon wrote:
> On Tue, Apr 22, 2008 at 09:28:39AM +0200, Luigi Rizzo wrote:
>  > related to this bug, i am seeing similar problems with RELENG_7 and amd64,
>  > with an ASUS M2N-VM DVI motherboard
>  > http://www.asus.com/products.aspx?modelmenu=1&model=1841&l1=3&l2=101&l3=567&l4=0
>  > and an Athlon64-BE2400 dual core CPU .
>  > 
>  > Under heavy load, e.g. scp-ing a large file over the local network,
>  > and at the same time doing a buildkernel or building a port,
>  > and with X11 active (using the 'vesa' xorg driver)
>  > the network card stalls and doesn't recover - i waited over 10 minutes
>  > hoping for the watchdog or some timeout to kick in, the only way
>  > to bring the link back up was
>  > 
>  > 	ifconfig nfe0 down ; ifconfig nfe0 up
>  > 	dhclient nfe0
>  > 
>  > doing only ifconfig down/up or only dhclient did not help, i needed both.
...
> Your BIOS may have an option for ASF related one for onboard NIC.
> Try toggling that option and see how it goes.
...
> Just vague guess, how about disabling MSI/MSI-X in loader.conf?
> (hw.nfe.msi_disable = "1", hw.nfe.msix_disable = "1")
> If you are using jumbo frame, try disabling it too.
> 
>  > Hope this helps...
>  > 
> 
> It would be even better if you can post verbosed boot messages
> related wiht nfe(4) and PHY driver.

will try to do all the above, but upon further investigation the
problem appears even on i386 and really seems related to the
receive queue filling up and the condition not being detected
due to a race.

Things like this used to happen in the past in several network drivers,
and there is a comment suggesting the same thing in one of the
commit logs for the openbsd nfe driver. So that's the part i am
going to investigate (i have strong motivations with 5 such machines
in my lab...)

My preliminary question is the following: is the 'nfe' driver just
an adaptation from some other driver (possibly trying to guess the
way the NIC synchronizes with the CPU), or there is someone who
carefully studied that specific issue ?

	cheers
	luigi
Received on Wed Apr 23 2008 - 07:09:14 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:30 UTC