Re: Instant panic while trying run ports-mgmt/poudriere

From: Konstantin Belousov <kostikbel_at_gmail.com>
Date: Tue, 1 Sep 2015 21:44:39 +0300
On Tue, Sep 01, 2015 at 11:24:06AM -0700, John-Mark Gurney wrote:
> Andriy Gapon wrote this message on Thu, Aug 27, 2015 at 23:21 +0300:
> > On 27/08/2015 21:09, John-Mark Gurney wrote:
> > > Andriy Gapon wrote this message on Thu, Aug 27, 2015 at 10:21 +0300:
> > >> On 27/08/2015 02:36, John-Mark Gurney wrote:
> > >>> We should/cannot get here w/ an empty list.  If we do, then there is
> > >>> something seriously wrong...  The current kn (which we must have as we
> > >>> are here) MUST be on the list, but as you just showed, there are no
> > >>> knotes on the list.
> > >>>
> > >>> Can you get me a print of the knote?  That way I can see what flags
> > >>> are on it?
> > >>
> > >> Apologies if the following might sound a little bit patronizing, but it
> > >> seems that you have got all the facts correctly, but somehow the
> > >> connection between them did not become clear.
> > >>
> > >> So:
> > >> 1. The list originally is NOT empty.  I guess that it has one entry, but
> > >> that's an unimportant detail.
> > >> 2. This is why the loop is entered. It's a fact that it is entered.
> > >> 3. The list becomes empty precisely because the entry is removed during
> > >> the iteration in the loop (as kib has explained).  It's a fact that the
> > >> list became empty at least in the panic that I reported.
> > > 
> > > On you're latest dump, you said:
> > > Here is another +1 with r286922.                                                
> > > I can add a couple of bits of debugging data:                                   
> > >                                                                                 
> > > (kgdb) fr 8                                                                     
> > > #8  0xffffffff80639d60 in knote (list=0xfffff8019a733ea0,                       
> > > hint=2147483648, lockflags=<value optimized out>) at                            
> > > /usr/src/sys/kern/kern_event.c:1964                                             
> > > 1964                    } else if ((lockflags & KNF_NOKQLOCK) != 0) {           
> > > 
> > > First off, that can't be r286922, per:
> > > https://svnweb.freebsd.org/base/stable/10/sys/kern/kern_event.c?annotate=286922
> > > 
> > > line 1964 is blank...  The line of code above should be at line 1884,
> > > so not sure what is wrong here...
> > 
> > No, it can not be indeed, because I am running head.
> > r286922 was the latest version of the repository, not the head branch,
> > at the moment when I pulled the repository via git.
> > 
> > > Assuming that the pc really is at the line, f_event has not yet been
> > > called,
> > 
> > Even on the second loop iteration?
> > 
> > >which is why I said that the list cannot be empty yet, as
> > > f_event hasn't been called yet to remove the knote...  It could be that
> > > optimization moved stuff around, but if that is the case, then the
> > > above wasn't useful..
> > 
> > I provided the disassembly of the code as well, it's very obvious how
> > the code was translated.
> > 
> > >> 4. The element is not only unlinked from the list, but its memory is
> > >> also freed.
> > > 
> > > Where is the memory freed?  A knote MUST NOT be freed in an f_event
> > > handler.  The only location that a list element is allowed to be
> > > freed is in knote_drop, which must happen after f_detach is called,
> > > but that can't/won't happen from knote (I believe the timer handles
> > > this specially, but we are talking about normal knlist type filters)..
> > 
> > Well, right.  knote()->filt_proc()->knlist_remove_inevent() just removes
> > the knote from the list.  But then there is KNOTE_ACTIVATE() that passes
> > the knote to a different owner (so to say). And given that the knote has
> > EV_ONESHOT set on it (in filt_proc) and that poudriere can put quite a
> > stress load on a system, I am not surprised that another thread gets a
> > chance to call knote_drop() on the knote before the original thread
> > proceeds to the next iteration.
> 
> Ok, I think I have identified the race that you guys were trying to
> tell me about, and though the _SAFE macro would be a similar fix, I'm
> going to rewrite the loop so that this is more explicit on what
> is happening here...
> 
> So, the race is this...  In knote, when the note is removed by
> f_event, things are find until the KQ lock is dropped...  Once this
> lock is dropped, effective ownership of the knote is transfered
> from the knlist to the kq lock as the _DETACHED flag is now set,
> which means that reading any fields from that note is undefined..
> 
> Once the kq lock is released in knote, then it is possible for a
> functional like kqueue_scan to endup knote_drop'ing the note...
Did you read the commit message and my previous N messages about the
subject ? Can you point me at a difference between the commit message
and the above text ?

I object against the your pointless and fact-less backout request
and have no intention of complying with it.

> 
> Upon further examination, we may have another race as in knote_drop,
> when we call f_detach, we don't have the list locked, nor kq, which
> means that knlist_remove_inevent could be modifing the list at the same
> time that kqueue_register could be modifing it to remove a _DELETED
> note...
> 
> I'd like to close both races at the same time since they go
> hand in hand...
> 
> > > The rest of your explination is invalid due to the invalid assumption
> > > of this point...
> > 
> > Eagerly waiting for your explanation...
> > 
> > > If you can provide to me where the knote is free'd in knote, w/
> > > function/line number stack trace (does not have to be dump, but a
> > > sample call path), then I'll reconsider, and fix that bug...
> > >> 5. That's why we have the use after free: SLIST_FOREACH is trying to get
> > >> a pointer to a next element from the freed memory.
> > >> 6. This is why the commit for trashing the freed memory made all the
> > >> difference: previously the freed memory was unlikely to be re-used /
> > >> modified, so the use-after-free had a high chance of succeeding.  It's a
> > >> fact that in my panic there was an attempt to dereference a trashed pointer.
> > >> 7. Finally, this is why SLIST_FOREACH_SAFE helps here: we stash the
> > >> pointer to the next element beforehand and, thus, we do not access the
> > >> freed memory.
> > >>
> > >> Please let me know if you see any fault in above reasoning or if
> > >> something is still no clear.
> 
> -- 
>   John-Mark Gurney				Voice: +1 415 225 5579
> 
>      "All that I will do, has been done, All that I have, has not."
> _______________________________________________
> freebsd-current_at_freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org"
Received on Tue Sep 01 2015 - 16:44:46 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:59 UTC