Re: threadlock and msk watchdog timeout

From: Pyun YongHyeon <pyunyh_at_gmail.com>
Date: Fri, 13 Jul 2007 19:08:29 +0900
On Fri, Jul 13, 2007 at 04:43:25PM +0800, Li-Lun Wang (Leland Wang) wrote:
 > -----BEGIN PGP SIGNED MESSAGE-----
 > Hash: SHA1
 > 
 > Hi,
 > 
 > After making world a couple of days ago, my msk(4) became very
 > unstable.  Under moderate network load, the interface hung and I
 > received
 > 
 > 	kernel: msk0: watchdog timeout (missed Tx interrupts) -- recovering
 > 
 > at least once every several minutes and 
 > 
 > 	kernel: msk0: Rx FIFO overrun!
 > 
 > occasionally.
 > 
 > It was so annoying that I took the trouble of binary searching the
 > kernel version to find the one destabilized my msk(4).
 > 
 > The outcome of the search turned out te be strange.  Instead of
 > finding a date after which msk(4) became so very unstable, it *seemed*
 > that the older the kernel version the stabler msk(4) I got, and the
 > newer the kernel version the easier and more often msk(4) hung.
 > 
 > I managed to pin down that with the kernel as of 2007.06.04.12.00.00,
 > it seemed not to give me any msk watchdog timeout at all, and that
 > with the kernel as of 2007.06.05.12.00.00, msk(4) began to hang and
 > the watchdog began to timeout once in a while.  There may be a latter
 > commit that made my msk(4) even more unstable, but I am not sure about
 > this part as it is not easy to measure the level of "unstableness" of
 > the network.
 > 
 > It seems that the most significant commit between 2007.06.04.12.00.00
 > and 2007.06.05.12.00.00 was threadlock by jeff_at_.  I don't know why or
 > how it would affect msk(4), though.  I was using SCHED_SMP on a C2D,
 > but switched back to SCHED_ULE when I did the search.
 > 
 > I discovered a couple other funny phenomena during the search that may
 > also suggest this be related to threadlock.  One is that msk(4) seemed
 > to hang less frequently when the system was busy building world or
 > kernel.  The other thing is that I seemed to be able to help unhang
 > the interface by switching the input focus in X Window by move my
 > mouse cursor to another window.
 > 
 > My result might not be accurate, though, as I only rebuilt the kernel,
 > not the whole world, when I did the search.
 > 

Does msk(4) use shared interrupt?
Show me the output of "vmstat -i".

-- 
Regards,
Pyun YongHyeon
Received on Fri Jul 13 2007 - 08:08:38 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:14 UTC