Re: panic with tcp timers

From: Randall Stewart <rrs_at_netflix.com> Date: Sat, 25 Jun 2016 04:48:51 -0400 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:06 UTC

So

All of our timers in TCP do something like
---------------------
INFO-LOCK
INP_WLOCK

if (inp needs to be dropped) {
    drop-it
}
do other work

UNLOCK-INP
UNLOCK-INFO
------------------

And generally the path “inp needs to be dropped” is rarely taken.

So why don’t we change the procedure to something like:

INP_WLOCK
if (inp needs to be dropped) {
     inp_ref()
     INP_WUNLOCK()
     INFO_LOCK()
     INP_WLOCK()
     if (inp_release_ref) {
	   /* victory its dead */
           INFO_UNLOCK
           return
      }
      do-the-drop
  }

This way we would only go grab the INFO lock in those rarer cases
when we *do* actually want to kill the tcb and at the same time
it would make the current callout system work correctly.. which as
many have pointed out would be much better if we could just let the
lock be gotten by the callout system. Hmm maybe with this scheme we
could even do that...

R

> On Jun 20, 2016, at 1:45 PM, Julien Charbon <jch_at_FreeBSD.org> wrote:
> 
> 
> Hi,
> 
> On 6/20/16 11:58 AM, Gleb Smirnoff wrote:
>> On Mon, Jun 20, 2016 at 11:55:55AM +0200, Julien Charbon wrote:
>> J> > On Fri, Jun 17, 2016 at 11:27:39AM +0200, Julien Charbon wrote:
>> J> > J> > Comparing stable/10 and head, I see two changes that could
>> J> > J> > affect that:
>> J> > J> > 
>> J> > J> > - callout_async_drain
>> J> > J> > - switch to READ lock for inp info in tcp timers
>> J> > J> > 
>> J> > J> > That's why you are in To, Julien and Hans :)
>> J> > J> > 
>> J> > J> > We continue investigating, and I will keep you updated.
>> J> > J> > However, any help is welcome. I can share cores.
>> J> > 
>> J> > Now, spending some time with cores and adding a bunch of
>> J> > extra CTRs, I have a sequence of events that lead to the
>> J> > panic. In short, the bug is in the callout system. It seems
>> J> > to be not relevant to the callout_async_drain, at least for
>> J> > now. The transition to READ lock unmasked the problem, that's
>> J> > why NetflixBSD 10 doesn't panic.
>> J> > 
>> J> > The panic requires heavy contention on the TCP info lock.
>> J> > 
>> J> > [CPU 1] the callout fires, tcp_timer_keep entered
>> J> > [CPU 1] blocks on INP_INFO_RLOCK(&V_tcbinfo);
>> J> > [CPU 2] schedules the callout
>> J> > [CPU 2] tcp_discardcb called
>> J> > [CPU 2] callout successfully canceled
>> J> > [CPU 2] tcpcb freed
>> J> > [CPU 1] unblocks... panic
>> J> > 
>> J> > When the lock was WLOCK, all contenders were resumed in a
>> J> > sequence they came to the lock. Now, that they are readers,
>> J> > once the lock is released, readers are resumed in a "random"
>> J> > order, and this allows tcp_discardcb to go before the old
>> J> > running callout, and this unmasks the panic.
>> J> 
>> J>  Highly interesting.  I should be able to reproduce that (will be useful
>> J> for testing the corresponding fix).
>> J> 
>> J>  Fix proposal:  If callout_async_drain() returns 0 (fail) (instead of 1
>> J> (success) here) when the callout cancellation is a success _but_ the
>> J> callout is current running, that should fix it.
>> J> 
>> J>  For the history:  It comes back to my old callout question:
>> J> 
>> J>  Does _callout_stop_safe() is allowed to return 1 (success) even if the
>> J> callout is still currently running;  a.k.a. it is not because you
>> J> successfully cancelled a callout that the callout is not currently running.
>> J> 
>> J>  We did propose a patch to make _callout_stop_safe() returns 0 (fail)
>> J> when the callout is currently running:
>> J> 
>> J> callout_stop() should return 0 when the callout is currently being
>> J> serviced and indeed unstoppable
>> J> https://reviews.freebsd.org/differential/changeset/?ref=62513&whitespace=ignore-most
>> J> 
>> J>  But this change impacted too many old code paths and was interesting
>> J> only for TCP timers and thus was abandoned.
>> 
>> The fix I am working on now is doing exactly that. callout_reset must
>> return 0 if the callout is currently running.
>> 
>> What are the old paths impacted?
> 
> Actually all the paths that check the callout_stop() return value AND
> call both callout_reset() and callout_stop() AND use mpsafe callout().
> And for each path, we would have to check our patch was ok (or not).
> 
> Thus, if you only do the change in callout_async_drain() context, you
> don't impact the "old" callout_stop() behavior and get the desired
> behavior for the TCP timers.  Might be a good trade-off...
> 
> My 2 cents.
> 
> --
> Julien

--------
Randall Stewart
rrs_at_netflix.com
803-317-4952