Re: Broadcom bge(4) panics while shutting down

From: John Baldwin <jhb_at_freebsd.org>
Date: Thu, 14 May 2009 17:00:40 -0400
On Thursday 14 May 2009 3:47:16 pm Xin LI wrote:
> Hi, Alexander,
> 
> Alexander Sack wrote:
> > Hello:
> >
> > Under heavy traffic (100% utilization GIGE on a 2 port BGE card)
> > running BGE CURRENT driver I see panics on shutdown.  The reason is
> > because bge_rxeof() while processing its RX ring of BD's drops the
> > softc lock when it hands it off to its input function.  If bge_stop()
> > is waiting for it, it will then proceed to acquire lock and then
> > quiesce the hardware (reseting the card, clearing out BDs etc.).  Once
> > bge_stop() releases the softc lock, then bge_rxeof() under an
> > interrupt context (no polling here) will reacquire and continue to
> > process the ring which is a bad idea.  It should check to see if the
> > card is still running before continuing processing BDs (i.e. once
> > IF_DRV_RUNNING has been reset by bge_stop(), bge_rxeof() is done, bail
> > out).
> >
> > Here is my first go around with this patch:
> >
> >
> > -- if_bge.c.CURRENT	2009-05-14 14:39:39.000000000 -0400
> > +++ if_bge.c	2009-05-14 14:39:24.000000000 -0400
> > _at__at_ -3081,6 +3081,10 _at__at_
> >  		uint16_t		vlan_tag = 0;
> >  		int			have_tag = 0;
> >
> > +		if (!(ifp->if_drv_flags & IFF_DRV_RUNNING)) {
> > +			return;
> > +		}
> > +
> >  #ifdef DEVICE_POLLING
> >  		if (ifp->if_capenable & IFCAP_POLLING) {
> >  			if (sc->rxcycles <= 0)
> >
> >
> > This prevents any panics during shutdown under heavy load and AS IT
> > TURNS out (I feel stupid for not looking) that em(4) already had this
> > check in its em_rxeof() function (right at the top of the loop).  I'm
> > more than happy changing it to the em style but above seems reasonable
> > to me though I have to verify there isn't anything missing off the
> > loop from a hardware standpoint (I don't think so because bge_stop()
> > did all the dirty work so I believe touching any registers after that
> > from bge_rxeof() is a bad idea).
> >
> > Preliminary testing shows no more panics start and stopping ports
> > under heavy load (panics were almost immediate otherwise).
> >
> > Thoughts?
> 
> I think this would solve the problem but I'm not sure whether this would
> increase some overhead on the RX path.  It seems that there is a race
> between bge_release_resources() and bge_intr(), I mean, it might be a
> good idea to "drain" bge_intr() instead?

Usually just detach() drains the interrupt handler.  However, an 'ifconfig 
bge0 down' could probably provoke this as well.  I would probably do the 
check right after re-acquiring the lock at the bottom of the loop before 
touching anything else.

-- 
John Baldwin
Received on Thu May 14 2009 - 19:21:57 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:47 UTC