Re: call for bge(4) testers

From: Pyun YongHyeon <pyunyh_at_gmail.com> Date: Thu, 24 Aug 2006 09:26:32 +0900 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:59 UTC

On Wed, Aug 23, 2006 at 04:40:35PM +0400, Oleg Bulyzhin wrote:
 > On Wed, Aug 23, 2006 at 09:55:54AM +0900, Pyun YongHyeon wrote:
 > > On Wed, Aug 23, 2006 at 12:43:42AM +0400, Oleg Bulyzhin wrote:
 > >  > On Tue, Aug 22, 2006 at 02:44:34PM +0200, Michael Reifenberger wrote:
 > >  > > On Tue, 22 Aug 2006, Pyun YongHyeon wrote:
 > >  > > ...
 > >  > > >I'm not familiar with vge(4) and don't have hardwares supported by
 > >  > > >vge(4). Because vge(4) supports a kind of interrupt moderation, there
 > >  > > >is a possiblity to have the same issue seen on em(4).
 > >  > > >If you want my blind patch I can send a patch for you.
 > >  > > >
 > >  > > Yes, please!
 > >  > > I can test it (on RELENG_6 though).
 > >  > 
 > >  > I have an idea why those timeouts can happen. Could you please test
 > >  > attached patch? It may help (or may not). Anyway would be fine
 > >  > to know results.
 > >  > 
 > > 
 > > Since vge(4) uses MTX_RECURSE mutex and miibus(4) handler is
 > > protected with the mutex I guess it wouldn't help much.
 > > I guess it needs a seperate mutex to protect miibus(4) handler
 > > and should remove the use of MTX_RECURSE.
 > 
 > Hmm.
 > 1) _ifmedia_upd() & _ifmedia_sts() functions are not called from mii layer.
 > 2) As i can see MII layer is not protected by anything, unless you
 > specially acquire driver lock prior to calling mii_ function.
 > Locking ifmedia callbacks should be done (though, it may not help
 > with watchdogs timeout), otherwise we have race on accessing PHY registers.
 > (kern/98738).
 > 
 > As i can see, random watchdog timeouts was reported for em, bge, vge, sk
 > (and maybe others, those ones which i remember) drivers.
 > All of them has unlocked _ifmedia_ functions.
 > 

AFAIK all known sk(4) bug were fixed. If it's not please let me know.

 > My idea was: perhaps, under certain condition, concurrent access to PHY could
 > lead to hardware deadlock.
 > 

Yes. Because MII bus access needs several steps to access PHY
registers its operation shouldn't be interrupted until all pending
requests are served. 

I can't sure you remember my mail for MII lock which modifies
mii_phy_probe API to take an additional mutex. The driver mutex
could be used with MII bus access/callbacks.
If interface is up/running and auto negotiation is in progress MII
layer would inspect BMSR register periodically to know the state
of link. During the time if you run ifconfig(8) to know the state
of the link or to change media type/duplex it will access PHY
registers. Normally it would end up with "link states coalesced"
messages.

As you know the two callbacks(vge_ifmedia_upd/vge_ifmedia_sts) will
end up with calling mii_mediachg() or mii_pollstat() which in turn
access PHY registers. So if MII access is properly serialized we
wouldn't get stale data. I guess your fix solves it by protecting
callbacks with driver mutex but it wouldn't fix other cases.
For example see vge_miibus_statchg MII interface.

 > 
 > > vge(4) also has a bug
 > > if mbuf chain is too long(7 or higher) and defragmentation with
 > > m_defrag(9) fails it would access an invalid mbuf chain.
 > > All these requires lots of work and need a real hardware.
 > > Oleg, if you have hardware, would you fix it?
 > 
 > Unfortunately i don't have vge hardware.

-- 
Regards,
Pyun YongHyeon