Re: panic with tcp timers

From: Randall Stewart <rrs_at_netflix.com> Date: Sat, 25 Jun 2016 10:41:24 -0400 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:06 UTC

Ok

Lets try this again with my source changed to my _at_freebsd.net :-)

Now I am also attaching a patch for you Gleb, this will take some poking to
get in to your NF-head since it incorporates some changes we made earlier.

I think this will fix the problem.. i.e. dealing with two locks in the callout system (which it was
never meant to have done)..

Note we probably can move the code to use the callout lock init now.. but lets see if this works
on your setup on c096 and if so we can think about doing that.

R

> On Jun 25, 2016, at 4:48 AM, Randall Stewart <rrs_at_netflix.com> wrote:
> 
> So
> 
> All of our timers in TCP do something like
> ---------------------
> INFO-LOCK
> INP_WLOCK
> 
> if (inp needs to be dropped) {
>    drop-it
> }
> do other work
> 
> UNLOCK-INP
> UNLOCK-INFO
> ------------------
> 
> And generally the path “inp needs to be dropped” is rarely taken.
> 
> So why don’t we change the procedure to something like:
> 
> INP_WLOCK
> if (inp needs to be dropped) {
>     inp_ref()
>     INP_WUNLOCK()
>     INFO_LOCK()
>     INP_WLOCK()
>     if (inp_release_ref) {
> 	   /* victory its dead */
>           INFO_UNLOCK
>           return
>      }
>      do-the-drop
>  }
> 
> This way we would only go grab the INFO lock in those rarer cases
> when we *do* actually want to kill the tcb and at the same time
> it would make the current callout system work correctly.. which as
> many have pointed out would be much better if we could just let the
> lock be gotten by the callout system. Hmm maybe with this scheme we
> could even do that...
> 
> R
> 
> 
>> On Jun 20, 2016, at 1:45 PM, Julien Charbon <jch_at_FreeBSD.org> wrote:
>> 
>> 
>> Hi,
>> 
>> On 6/20/16 11:58 AM, Gleb Smirnoff wrote:
>>> On Mon, Jun 20, 2016 at 11:55:55AM +0200, Julien Charbon wrote:
>>> J> > On Fri, Jun 17, 2016 at 11:27:39AM +0200, Julien Charbon wrote:
>>> J> > J> > Comparing stable/10 and head, I see two changes that could
>>> J> > J> > affect that:
>>> J> > J> > 
>>> J> > J> > - callout_async_drain
>>> J> > J> > - switch to READ lock for inp info in tcp timers
>>> J> > J> > 
>>> J> > J> > That's why you are in To, Julien and Hans :)
>>> J> > J> > 
>>> J> > J> > We continue investigating, and I will keep you updated.
>>> J> > J> > However, any help is welcome. I can share cores.
>>> J> > 
>>> J> > Now, spending some time with cores and adding a bunch of
>>> J> > extra CTRs, I have a sequence of events that lead to the
>>> J> > panic. In short, the bug is in the callout system. It seems
>>> J> > to be not relevant to the callout_async_drain, at least for
>>> J> > now. The transition to READ lock unmasked the problem, that's
>>> J> > why NetflixBSD 10 doesn't panic.
>>> J> > 
>>> J> > The panic requires heavy contention on the TCP info lock.
>>> J> > 
>>> J> > [CPU 1] the callout fires, tcp_timer_keep entered
>>> J> > [CPU 1] blocks on INP_INFO_RLOCK(&V_tcbinfo);
>>> J> > [CPU 2] schedules the callout
>>> J> > [CPU 2] tcp_discardcb called
>>> J> > [CPU 2] callout successfully canceled
>>> J> > [CPU 2] tcpcb freed
>>> J> > [CPU 1] unblocks... panic
>>> J> > 
>>> J> > When the lock was WLOCK, all contenders were resumed in a
>>> J> > sequence they came to the lock. Now, that they are readers,
>>> J> > once the lock is released, readers are resumed in a "random"
>>> J> > order, and this allows tcp_discardcb to go before the old
>>> J> > running callout, and this unmasks the panic.
>>> J> 
>>> J>  Highly interesting.  I should be able to reproduce that (will be useful
>>> J> for testing the corresponding fix).
>>> J> 
>>> J>  Fix proposal:  If callout_async_drain() returns 0 (fail) (instead of 1
>>> J> (success) here) when the callout cancellation is a success _but_ the
>>> J> callout is current running, that should fix it.
>>> J> 
>>> J>  For the history:  It comes back to my old callout question:
>>> J> 
>>> J>  Does _callout_stop_safe() is allowed to return 1 (success) even if the
>>> J> callout is still currently running;  a.k.a. it is not because you
>>> J> successfully cancelled a callout that the callout is not currently running.
>>> J> 
>>> J>  We did propose a patch to make _callout_stop_safe() returns 0 (fail)
>>> J> when the callout is currently running:
>>> J> 
>>> J> callout_stop() should return 0 when the callout is currently being
>>> J> serviced and indeed unstoppable
>>> J> https://reviews.freebsd.org/differential/changeset/?ref=62513&whitespace=ignore-most
>>> J> 
>>> J>  But this change impacted too many old code paths and was interesting
>>> J> only for TCP timers and thus was abandoned.
>>> 
>>> The fix I am working on now is doing exactly that. callout_reset must
>>> return 0 if the callout is currently running.
>>> 
>>> What are the old paths impacted?
>> 
>> Actually all the paths that check the callout_stop() return value AND
>> call both callout_reset() and callout_stop() AND use mpsafe callout().
>> And for each path, we would have to check our patch was ok (or not).
>> 
>> Thus, if you only do the change in callout_async_drain() context, you
>> don't impact the "old" callout_stop() behavior and get the desired
>> behavior for the TCP timers.  Might be a good trade-off...
>> 
>> My 2 cents.
>> 
>> --
>> Julien
> 
> --------
> Randall Stewart
> rrs_at_netflix.com
> 803-317-4952
> 
> 
> 
> 
> 

--------
Randall Stewart
rrs_at_netflix.com
803-317-4952