Re: msk watchdog timeout

From: Pyun YongHyeon <pyunyh_at_gmail.com>
Date: Fri, 26 Dec 2008 11:23:59 +0900
On Wed, Dec 24, 2008 at 03:44:39PM +0000, Bruce Simpson wrote:
 > Hi,
 > 
 > I just observed a similar issue with the onboard msk0 on my ASUS Vintage 
 > AH-1.
 > 
 > The symptoms occurred in the last hour, when attempting to download the 
 > 7.1-RC1-i386-dvd1.iso.gz from ftp.plig.net mirror.
 > It is triggered when the data rate of the wget job hit around 890 KiB/sec.
 > 
 > Let me know if you need a PR raised for this.
 > 
 > uname -a:
 > %%%
 > FreeBSD anglepoise.lon.incunabulum.net 7.1-PRERELEASE FreeBSD 
 > 7.1-PRERELEASE #0: Wed Dec  3 17:03:33 GMT 2008     
 > root_at_anglepoise.lon.incunabulum.net:/home/obj/usr/src/sys/ANGLEPOISE7  amd64
 > %%%
 > 
 > dmesg output from syslog:
 > %%%
 > Dec 24 15:08:05 anglepoise kernel: msk0: watchdog timeout (missed Tx 
 > interrupts)
 > -- recovering
 > Dec 24 15:09:32 anglepoise kernel: msk0: watchdog timeout
 > Dec 24 15:09:32 anglepoise kernel: msk0: link state changed to DOWN
 > Dec 24 15:09:34 anglepoise kernel: msk0: link state changed to UP
 > Dec 24 15:09:46 anglepoise kernel: msk0: watchdog timeout (missed Tx 
 > interrupts)
 > -- recovering
 > Dec 24 15:10:08 anglepoise kernel: msk0: watchdog timeout (missed Tx 
 > interrupts)
 > -- recovering
 > Dec 24 15:10:32 anglepoise kernel: msk0: watchdog timeout (missed Tx 
 > interrupts)
 > -- recovering
 > Dec 24 15:11:03 anglepoise kernel: msk0: watchdog timeout
 > Dec 24 15:11:03 anglepoise kernel: msk0: link state changed to DOWN
 > Dec 24 15:11:05 anglepoise kernel: msk0: link state changed to UP
 > Dec 24 15:12:10 anglepoise kernel: msk0: watchdog timeout
 > Dec 24 15:12:10 anglepoise kernel: msk0: link state changed to DOWN
 > Dec 24 15:12:12 anglepoise kernel: msk0: link state changed to UP
 > Dec 24 15:12:20 anglepoise kernel: msk0: watchdog timeout (missed Tx 
 > interrupts)
 > -- recovering
 > Dec 24 15:12:58 anglepoise last message repeated 3 times
 > Dec 24 15:14:28 anglepoise last message repeated 12 times
 > Dec 24 15:14:29 anglepoise kernel: msk0: link state changed to DOWN
 > Dec 24 15:14:31 anglepoise kernel: msk0: link state changed to UP
 > Dec 24 15:14:39 anglepoise kernel: msk0: watchdog timeout (missed Tx 
 > interrupts)
 > -- recovering
 > Dec 24 15:15:06 anglepoise last message repeated 3 times
 > Dec 24 15:15:21 anglepoise kernel: msk0: watchdog timeout (missed Tx 
 > interrupts)
 > -- recovering
 > Dec 24 15:18:27 anglepoise dhclient[339]: connection closed
 > Dec 24 15:18:27 anglepoise dhclient[339]: exiting.
 > Dec 24 15:18:33 anglepoise kernel: msk0: link state changed to DOWN
 > Dec 24 15:18:35 anglepoise kernel: msk0: link state changed to UP
 > Dec 24 15:18:35 anglepoise kernel: msk0: link state changed to DOWN
 > Dec 24 15:18:37 anglepoise kernel: msk0: link state changed to UP
 > Dec 24 15:18:46 anglepoise kernel: msk0: watchdog timeout (missed Tx 
 > interrupts)
 > -- recovering
 > Dec 24 15:18:49 anglepoise kernel: msk0: link state changed to DOWN
 > Dec 24 15:18:51 anglepoise kernel: msk0: link state changed to UP
 > Dec 24 15:19:00 anglepoise kernel: msk0: watchdog timeout (missed Tx 
 > interrupts)
 > -- recovering
 > Dec 24 15:19:38 anglepoise last message repeated 4 times
 > Dec 24 15:19:47 anglepoise kernel: msk0: watchdog timeout (missed Tx 
 > interrupts)
 > Dec 24 15:18:46 anglepoise kernel: msk0: watchdog timeout (missed Tx 
 > interrupts)
 > -- recovering
 > Dec 24 15:18:49 anglepoise kernel: msk0: link state changed to DOWN
 > Dec 24 15:18:51 anglepoise kernel: msk0: link state changed to UP
 > Dec 24 15:19:00 anglepoise kernel: msk0: watchdog timeout (missed Tx 
 > interrupts)
 > -- recovering
 > Dec 24 15:19:38 anglepoise last message repeated 4 times
 > Dec 24 15:19:47 anglepoise kernel: msk0: watchdog timeout (missed Tx 
 > interrupts)
 > -- recovering
 > Dec 24 15:25:48 anglepoise last message repeated 6 times
 > Dec 24 15:28:04 anglepoise kernel: msk0: promiscuous mode enabled
 > Dec 24 15:28:56 anglepoise kernel: msk0: promiscuous mode disabled
 > Dec 24 15:29:41 anglepoise sudo:      bms : TTY=ttyp4 ; PWD=/home/bms ; 
 > USER=roo
 > t ; COMMAND=/sbin/reboot
 > Dec 24 15:29:41 anglepoise reboot: rebooted by bms
 > Dec 24 15:29:41 anglepoise syslogd: exiting on signal 15
 > %%%
 > 
 > The DHCP lease is lost, msk0 appears to stop receiving traffic.
 > 
 > I *did* re-patch the cable on my switch around this point in time, and 
 > it's possible this triggered the condition.
 > Perhaps this is a receive DMA descriptor problem, or a PHY interrupt 

No, if this was root cause of the issue, msk(4) would have showed
"Rx descriptor error" on console. Of course this assumes the
controller can detect such errors.

 > problem?
 > 

msk(4) doesn't rely on PHY status change interrupt. The interrupt
is enabled by default, though.
I vaguely guess link state change handing in msk(4) is not right
as msk(4) just checked link UP/DOWN event. I'm working on improving
link state handling to support 88E8040 but it still requires a lot
of code and workaround.

 > I confirmed that neither the cabling itself nor other network 
 > infrastructure were responsible.
 > 

Ok.

Yukon controllers look really buggy and seem to require different
workaround for each controller/revision. There was fix for one of
silicon bug of Yukon controllers so it would be even better if you
can apply the workaround in HEAD(r183346).
However one user also reported watchdog timeouts on CURRENT so
there still seem to have unresolved issues. I couldn't reproduce
the issue on my box but would you try attached patch?
Also show me dmesg output to see what revision you have(This
information is not available with pciconf(8)).

-- 
Regards,
Pyun YongHyeon

Received on Fri Dec 26 2008 - 01:24:09 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:39 UTC