On Thu, May 14, 2009 at 4:01 PM, Alexander Sack <pisymbol_at_gmail.com> wrote: > On Thu, May 14, 2009 at 3:47 PM, Xin LI <delphij_at_delphij.net> wrote: >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> Hi, Alexander, >> >> Alexander Sack wrote: >>> Hello: >>> >>> Under heavy traffic (100% utilization GIGE on a 2 port BGE card) >>> running BGE CURRENT driver I see panics on shutdown. The reason is >>> because bge_rxeof() while processing its RX ring of BD's drops the >>> softc lock when it hands it off to its input function. If bge_stop() >>> is waiting for it, it will then proceed to acquire lock and then >>> quiesce the hardware (reseting the card, clearing out BDs etc.). Once >>> bge_stop() releases the softc lock, then bge_rxeof() under an >>> interrupt context (no polling here) will reacquire and continue to >>> process the ring which is a bad idea. It should check to see if the >>> card is still running before continuing processing BDs (i.e. once >>> IF_DRV_RUNNING has been reset by bge_stop(), bge_rxeof() is done, bail >>> out). >>> >>> Here is my first go around with this patch: >>> >>> >>> -- if_bge.c.CURRENT 2009-05-14 14:39:39.000000000 -0400 >>> +++ if_bge.c 2009-05-14 14:39:24.000000000 -0400 >>> _at__at_ -3081,6 +3081,10 _at__at_ >>> uint16_t vlan_tag = 0; >>> int have_tag = 0; >>> >>> + if (!(ifp->if_drv_flags & IFF_DRV_RUNNING)) { >>> + return; >>> + } >>> + >>> #ifdef DEVICE_POLLING >>> if (ifp->if_capenable & IFCAP_POLLING) { >>> if (sc->rxcycles <= 0) >>> >>> >>> This prevents any panics during shutdown under heavy load and AS IT >>> TURNS out (I feel stupid for not looking) that em(4) already had this >>> check in its em_rxeof() function (right at the top of the loop). I'm >>> more than happy changing it to the em style but above seems reasonable >>> to me though I have to verify there isn't anything missing off the >>> loop from a hardware standpoint (I don't think so because bge_stop() >>> did all the dirty work so I believe touching any registers after that >>> from bge_rxeof() is a bad idea). >>> >>> Preliminary testing shows no more panics start and stopping ports >>> under heavy load (panics were almost immediate otherwise). >>> >>> Thoughts? >> >> I think this would solve the problem but I'm not sure whether this would >> increase some overhead on the RX path. It seems that there is a race >> between bge_release_resources() and bge_intr(), I mean, it might be a >> good idea to "drain" bge_intr() instead? > > Are you talking about detach time? Because bge_stop() gets called > before bge_release_resources() and stops host interrupts so where is > the race again? I mean at this point no more interrupts should be > delivered to bge_intr() (I can confirm from spec since BGE has > released it in the wild). So why would you "drain" it at this > point....(the hardware is down including the firmware). > > I agree it adds a little overhead to the standard bge_rxeof() path > which I agree is very sensitive to change. However, I think the check > at top is tolerable since the other recourse is crash. I mean its > very easy to reproduce. Flood a Broadcom card with traffic then stop > the card and let the race begin...it will go down in bge_rxeof() after > bge_stop releases the lock. > > I actually did not look at changing anything structurally to perhaps > make this whole predicament better but minimally there should be a > shield against this no? > > -aps > http://www.freebsd.org/cgi/query-pr.cgi?pr=134548 To track...with patch (though spacing got killed, my apologies, I moved the check into the while logic a la em). I've tested this with zero issue so far. -apsReceived on Thu May 14 2009 - 19:17:02 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:47 UTC