0n Wed, Aug 23, 2006 at 02:04:20PM +0400, Gleb Smirnoff wrote: >On Wed, Aug 23, 2006 at 06:55:04PM +0900, Pyun YongHyeon wrote: >P> On Wed, Aug 23, 2006 at 01:37:41PM +0400, Gleb Smirnoff wrote: >P> > On Tue, Aug 22, 2006 at 01:20:23PM +0900, Pyun YongHyeon wrote: >P> > P> After fixing em(4) watchdog bug, I looked over bge(4) and I think >P> > P> bge(4) may suffer from the same issue. So if you have seen occasional >P> > P> watchdog timeout errors on bge(4) please give the attached patch a try. >P> > P> The patch does fix false watchdog timeout error only. >P> > P> Typical pheonoma for false watchdog timeout error are >P> > P> o polling(4) fix the issue >P> > P> o random watchdog error >P> > P> >P> > P> If my patch fix the issue you could see the following messages. >P> > P> "missing Tx completion interrupt!" or "link lost -- resetting" >P> > >P> > I still think that this fix is incorrect. It is just a more gentle >P> > recovery from a fake watchdog timeout. >P> >P> Its sole purpose is to reinitialize hardware for real watchdog >P> timeouts. It's not fix for general watchdog timeouts. As I said other >P> mails, the fake watchdog timeout(losing Tx interrupts) for hardwares >P> with Tx interrupt moderation capability could be normal thing. So I >P> just want to know bge(4) also has the same feature(bug). > >According to several emails about em(4) fake watchdog timeouts, the >problem can be fixed by setting debug.mpsafenet=0. This makes me think >that the problem isn't caused by TX interrupt moderation, but some race >in the kernel. Really, if_slowtimo() doesn't acquire driver lock before >checking and modifying the if_timer field. > >Afaik, NIC drivers that can do interrupt moderation should set a timer >to a sane value, based on interrupt moderation settings, so that the >watchdog won't be ever called fakely. What is interrupt moderation ? -aWReceived on Wed Aug 23 2006 - 22:18:32 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:59 UTC