EBR fix for life cycle races was Re: panic with tcp timers

From: Matthew Macy <mmacy_at_nextbsd.org>
Date: Tue, 28 Jun 2016 23:19:45 -0700
 ---- On Tue, 28 Jun 2016 15:51:57 -0700 K. Macy <kmacy_at_freebsd.org> wrote ---- 
 > On Tue, Jun 28, 2016 at 10:51 AM, Matthew Macy <mmacy_at_nextbsd.org> wrote: 
 > > You guys should really look at Samy Bahra's epoch based reclamation. I solved a similar problem in drm/linuxkpi using it. 
 >  
 > The point being that this is a bug in the TCP life cycle handling 
 > _not_ in callouts. Churning the callout interface is not the best / 
 > only solution. 
 > -M 
 
Please see see D7017/D7018 as an example for an ultimately more robust solution than continued fiddling with the callout interface.

https://reviews.freebsd.org/D7018



Cheers.

-M

 > >  ---- On Tue, 28 Jun 2016 02:58:56 -0700 Julien Charbon <jch_at_freebsd.org> wrote ---- 
 > >  > 
 > >  >  Hi Randall, 
 > >  > 
 > >  > On 6/25/16 4:41 PM, Randall Stewart via freebsd-net wrote: 
 > >  > > Ok 
 > >  > > 
 > >  > > Lets try this again with my source changed to my _at_freebsd.net :-) 
 > >  > > 
 > >  > > Now I am also attaching a patch for you Gleb, this will take some poking to 
 > >  > > get in to your NF-head since it incorporates some changes we made earlier. 
 > >  > > 
 > >  > > I think this will fix the problem.. i.e. dealing with two locks in the callout system (which it was 
 > >  > > never meant to have done).. 
 > >  > > 
 > >  > > Note we probably can move the code to use the callout lock init now.. but lets see if this works 
 > >  > > on your setup on c096 and if so we can think about doing that. 
 > >  > 
 > >  >  Thanks for proposing a patch.  I believe your patch will work with 
 > >  > callout lock init, but not without:  You still have a use-after-free 
 > >  > issue on the tcpcb without callout lock init. 
 > >  > 
 > >  >  The case being subtle as usual, let me try to describe that could happen: 
 > >  > 
 > >  >  With your patch we have: 
 > >  > 
 > >  > void 
 > >  > tcp_timer_keep(void *xtp) 
 > >  > { 
 > >  >         struct tcpcb *tp = xtp; 
 > >  >         struct tcptemp *t_template; 
 > >  >         struct inpcb *inp; 
 > >  >         CURVNET_SET(tp->t_vnet); 
 > >  > #ifdef TCPDEBUG 
 > >  >         int ostate; 
 > >  > 
 > >  >         ostate = tp->t_state; 
 > >  > #endif 
 > >  >         inp = tp->t_inpcb; 
 > >  >         KASSERT(inp != NULL, ("%s: tp %p tp->t_inpcb == NULL", __func__, 
 > >  > tp)); 
 > >  >         INP_WLOCK(inp); 
 > >  >         if (callout_pending(&tp->t_timers->tt_keep) ### Use after free 
 > >  > of tp here 
 > >  >             !callout_active(&tp->t_timers->tt_keep)) { 
 > >  >                 INP_WUNLOCK(inp); 
 > >  >                 CURVNET_RESTORE(); 
 > >  >                 return; 
 > >  >         } 
 > >  >         ... 
 > >  > 
 > >  >  The use-after-free scenario: 
 > >  > 
 > >  > [CPU 1] the callout fires, tcp_timer_keep entered 
 > >  > [CPU 1] blocks on INP_WLOCK(inp); 
 > >  > [CPU 2] schedules tcp_timer_keep with callout_reset() 
 > >  > [CPU 2] tcp_discardcb called 
 > >  > [CPU 2] tcp_timer_keep callout successfully canceled 
 > >  > [CPU 2] tcpcb freed 
 > >  > [CPU 1] unblocks, the tcpcb is used 
 > >  > 
 > >  >  Then the tcpcb will used just after being freed...  Might also crash or 
 > >  > not depending in the case. 
 > >  > 
 > >  >  Extra notes: 
 > >  > 
 > >  >  o The invariant I see here is:  The "callout successfully canceled" 
 > >  > step should never happen when "the callout is currently being executed". 
 > >  > 
 > >  >  o Solutions I see to enforce this invariant: 
 > >  > 
 > >  >  - First solution:  Use callout lock init with inp lock, your patch 
 > >  > seems to permit that now. 
 > >  > 
 > >  >  - Second solution:  Change callout_async_drain() behavior:  It can 
 > >  > return 0 (fail) when the callout is currently being executed (no matter 
 > >  > what). 
 > >  > 
 > >  >  - Third solution:  Don't trust callout_async_drain(callout) return 
 > >  > value of 1 (success) if the previous call of callout_reset(callout) 
 > >  > returned 0 (fail).  That was the exact purpose of r284261 change, but 
 > >  > this solution is also step backward in modernization of TCP 
 > >  > timers/callout... 
 > >  > 
 > >  > https://svnweb.freebsd.org/base/stable/10/sys/netinet/tcp_timer.c?r1=284261&r2=284260&pathrev=284261 
 > >  > 
 > >  >  Hopefully my description is clear enough... 
 > >  > 
 > >  > -- 
 > >  > Julien 
 > >  > 
 > >  > 
 > > 
 > > _______________________________________________ 
 > > freebsd-current_at_freebsd.org mailing list 
 > > https://lists.freebsd.org/mailman/listinfo/freebsd-current 
 > > To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org" 
 > _______________________________________________ 
 > freebsd-current_at_freebsd.org mailing list 
 > https://lists.freebsd.org/mailman/listinfo/freebsd-current 
 > To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org" 
 > 
Received on Wed Jun 29 2016 - 04:19:55 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:06 UTC