On Thu, May 14, 2009 at 5:00 PM, John Baldwin <jhb_at_freebsd.org> wrote: > On Thursday 14 May 2009 3:47:16 pm Xin LI wrote: >> Hi, Alexander, >> >> Alexander Sack wrote: >> > Hello: >> > >> > Under heavy traffic (100% utilization GIGE on a 2 port BGE card) >> > running BGE CURRENT driver I see panics on shutdown. The reason is >> > because bge_rxeof() while processing its RX ring of BD's drops the >> > softc lock when it hands it off to its input function. If bge_stop() >> > is waiting for it, it will then proceed to acquire lock and then >> > quiesce the hardware (reseting the card, clearing out BDs etc.). Once >> > bge_stop() releases the softc lock, then bge_rxeof() under an >> > interrupt context (no polling here) will reacquire and continue to >> > process the ring which is a bad idea. It should check to see if the >> > card is still running before continuing processing BDs (i.e. once >> > IF_DRV_RUNNING has been reset by bge_stop(), bge_rxeof() is done, bail >> > out). >> > >> > Here is my first go around with this patch: >> > >> > >> > -- if_bge.c.CURRENT 2009-05-14 14:39:39.000000000 -0400 >> > +++ if_bge.c 2009-05-14 14:39:24.000000000 -0400 >> > _at__at_ -3081,6 +3081,10 _at__at_ >> > uint16_t vlan_tag = 0; >> > int have_tag = 0; >> > >> > + if (!(ifp->if_drv_flags & IFF_DRV_RUNNING)) { >> > + return; >> > + } >> > + >> > #ifdef DEVICE_POLLING >> > if (ifp->if_capenable & IFCAP_POLLING) { >> > if (sc->rxcycles <= 0) >> > >> > >> > This prevents any panics during shutdown under heavy load and AS IT >> > TURNS out (I feel stupid for not looking) that em(4) already had this >> > check in its em_rxeof() function (right at the top of the loop). I'm >> > more than happy changing it to the em style but above seems reasonable >> > to me though I have to verify there isn't anything missing off the >> > loop from a hardware standpoint (I don't think so because bge_stop() >> > did all the dirty work so I believe touching any registers after that >> > from bge_rxeof() is a bad idea). >> > >> > Preliminary testing shows no more panics start and stopping ports >> > under heavy load (panics were almost immediate otherwise). >> > >> > Thoughts? >> >> I think this would solve the problem but I'm not sure whether this would >> increase some overhead on the RX path. It seems that there is a race >> between bge_release_resources() and bge_intr(), I mean, it might be a >> good idea to "drain" bge_intr() instead? > > Usually just detach() drains the interrupt handler. However, an 'ifconfig > bge0 down' could probably provoke this as well. I would probably do the > check right after re-acquiring the lock at the bottom of the loop before > touching anything else. Yea John, you got a point about that. I submitted the patch with the check in the while logic thinking that which I BELIEVE is functionality equivalent (don't ask me which one is faster), i.e. as soon as we require it, check it since bge_stop() might have reset it. If you get a chance, can you look at the PR and let me know if you think it looks good? I really want this fixed in 7.x to be honest since its a pain in the headache (I was working on another subsystem when I ran into this). -apsReceived on Thu May 14 2009 - 19:27:30 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:47 UTC