Re: Instant panic while trying run ports-mgmt/poudriere

From: John-Mark Gurney <jmg_at_funkthat.com>
Date: Wed, 26 Aug 2015 16:51:18 -0700
Konstantin Belousov wrote this message on Mon, Aug 24, 2015 at 11:10 +0300:
> On Sun, Aug 23, 2015 at 10:35:44PM -0700, John-Mark Gurney wrote:
> > Konstantin Belousov wrote this message on Sun, Aug 23, 2015 at 15:54 +0300:
> > >  	if (kev->flags & EV_ADD)
> > > -		tkn = knote_alloc(waitok);	/* prevent waiting with locks */
> > > +		/*
> > > +		 * Prevent waiting with locks.  Non-sleepable
> > > +		 * allocation failures are handled in the loop, only
> > > +		 * if the spare knote appears to be actually required.
> > > +		 */
> > > +		tkn = knote_alloc(waitok);
> > 
> > if you add this comment, please add curly braces around the block...
> Ok.
> 
> > 
> > >  	else
> > >  		tkn = NULL;
> > >  
> > > _at__at_ -1310,8 +1315,7 _at__at_ done:
> > >  		FILEDESC_XUNLOCK(td->td_proc->p_fd);
> > >  	if (fp != NULL)
> > >  		fdrop(fp, td);
> > > -	if (tkn != NULL)
> > > -		knote_free(tkn);
> > > +	knote_free(tkn);
> > 
> > Probably should just change knote_free to a static inline that does
> > a uma_zfree as uma_zfree also does nothing is the input is NULL...
> This was already done in the patch (the removal of the NULL check in
> knote_free()). I usually do not add excessive inline keywords. Compilers
> are good, sometimes even too good, at figuring out the possibilities for
> inlining. knote_free() is inlined automatically.

Though it is, if we really change knote_free to a bare uma_free, then
either mark it inline (to be explicit about it's behavior), or make a
macro out of it... I don't particularly like functions that contain one
line of simple code...

> > > _at__at_ -1948,7 +1948,7 _at__at_ knote(struct knlist *list, long hint, int lockflags)
> > >  	 * only safe if you want to remove the current item, which we are
> > >  	 * not doing.
> > >  	 */
> > > -	SLIST_FOREACH(kn, &list->kl_list, kn_selnext) {
> > > +	SLIST_FOREACH_SAFE(kn, &list->kl_list, kn_selnext, tkn) {
> > 
> > Clearly you didn't read the comment that preceeds this line, or at
> > least didn't update it:
> >          * SLIST_FOREACH, SLIST_FOREACH_SAFE is not safe in our case, it is
> >          * only safe if you want to remove the current item, which we are
> >          * not doing.
> > 
> > So, you'll need to be more specific in why this needs to change...
> > When I wrote this code, I spent a lot of time looking at this, and
> > reasoned as to why SLIST_FOREACH_SAFE was NOT correct usage here...
> I explained what happens in the message.  The knote list is modified
> by the filter, see knlist_remove_inevent() call in filt_proc().
>
> > >  		kq = kn->kn_kq;
> > >  		KQ_LOCK(kq);
> > >  		if ((kn->kn_status & (KN_INFLUX | KN_SCAN)) == KN_INFLUX) {
> > > _at__at_ -2385,15 +2385,16 _at__at_ SYSINIT(knote, SI_SUB_PSEUDO, SI_ORDER_ANY, knote_init, NULL);
> > >  static struct knote *
> > >  knote_alloc(int waitok)
> > >  {
> > > -	return ((struct knote *)uma_zalloc(knote_zone,
> > > -	    (waitok ? M_WAITOK : M_NOWAIT)|M_ZERO));
> > > +
> > > +	return (uma_zalloc(knote_zone, (waitok ? M_WAITOK : M_NOWAIT) |
> > > +	    M_ZERO));
> > >  }
> > >  
> > >  static void
> > 
> > per above, we should add inline here...
> > 
> > >  knote_free(struct knote *kn)
> > >  {
> > > -	if (kn != NULL)
> > > -		uma_zfree(knote_zone, kn);
> > > +
> > > +	uma_zfree(knote_zone, kn);
> > >  }
> > >  
> > >  /*
> > 
> > I agree w/ the all the non-SLIST changes, but I disagree w/ the SLIST
> > change as I don't believe that all cases was considered...
> What cases do you mean ?
> 
> The patch does not unlock knlist lock in the iteration. As such, the
> only thread which could remove elements from the knlist, or rearrange
> the list, while loop is active, is the current thread. So I claim that
> the only the current iterating element can be removed, and the next list
> element stays valid. This is enough for _SAFE loop to work.
>
> Why do you think that _SAFE is incorrect ? Comment talks about very

I can't think of the reason right now, but I do remeber puzzling over
this issue for some hours when I wrote this code, and I had proved
to myself that _SAFE was NOT _SAFE for this use case...

In the quick look I just had, I have not been able to decide one way
or the other, but I'm suspicious that this is a recent issue, as this
code has been running for close to a decade w/o any issues, and wonder
if there was some other change that trigger the issue...

The reason I'm cautious about changing this is that the code has been
running fine for over a decade...  Have you done a full test to
validate that nothing else breaks?

Ok, after looking more at the original dump, this is a use after free
bug...  As I said in another email, it should not be possible to get
into the _FOREACH loop where knlist is an empty list.  If it does,
then there is another major bug that needs to be found...  A simple
change to _SAFE will not fix this use after free bug...

> different case, where the knlist lock is dropped. Then indeed, other
> thread may iterate in parallel, and invalidate the memoized next element
> while KN_INFLUX is set for the current element and knlist is dropped.
> But _SAFE in sys/queue.h never means 'safe for parallel mutators', it
> only means 'safe for the current iterator removing current element'.
> 
> I preferred not to touch the comment until it is confirmed that the
> change help.  I reformulated it now, trying to keep the note about
> unlock (but is it useful ?).

I prefer to prove through logic that the change is correct...

The comment is about a possible future optimization to eliminate
the kqueue_schedtask, though as that is less common (kq in kq) than
standard knote...

-- 
  John-Mark Gurney				Voice: +1 415 225 5579

     "All that I will do, has been done, All that I have, has not."
Received on Wed Aug 26 2015 - 21:51:19 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:59 UTC