Re: Instant panic while trying run ports-mgmt/poudriere

From: John-Mark Gurney <jmg_at_funkthat.com>
Date: Thu, 27 Aug 2015 11:09:45 -0700
Andriy Gapon wrote this message on Thu, Aug 27, 2015 at 10:21 +0300:
> On 27/08/2015 02:36, John-Mark Gurney wrote:
> > We should/cannot get here w/ an empty list.  If we do, then there is
> > something seriously wrong...  The current kn (which we must have as we
> > are here) MUST be on the list, but as you just showed, there are no
> > knotes on the list.
> > 
> > Can you get me a print of the knote?  That way I can see what flags
> > are on it?
> 
> Apologies if the following might sound a little bit patronizing, but it
> seems that you have got all the facts correctly, but somehow the
> connection between them did not become clear.
> 
> So:
> 1. The list originally is NOT empty.  I guess that it has one entry, but
> that's an unimportant detail.
> 2. This is why the loop is entered. It's a fact that it is entered.
> 3. The list becomes empty precisely because the entry is removed during
> the iteration in the loop (as kib has explained).  It's a fact that the
> list became empty at least in the panic that I reported.

On you're latest dump, you said:
Here is another +1 with r286922.                                                
I can add a couple of bits of debugging data:                                   
                                                                                
(kgdb) fr 8                                                                     
#8  0xffffffff80639d60 in knote (list=0xfffff8019a733ea0,                       
hint=2147483648, lockflags=<value optimized out>) at                            
/usr/src/sys/kern/kern_event.c:1964                                             
1964                    } else if ((lockflags & KNF_NOKQLOCK) != 0) {           

First off, that can't be r286922, per:
https://svnweb.freebsd.org/base/stable/10/sys/kern/kern_event.c?annotate=286922

line 1964 is blank...  The line of code above should be at line 1884,
so not sure what is wrong here...

Assuming that the pc really is at the line, f_event has not yet been
called, which is why I said that the list cannot be empty yet, as
f_event hasn't been called yet to remove the knote...  It could be that
optimization moved stuff around, but if that is the case, then the
above wasn't useful..

> 4. The element is not only unlinked from the list, but its memory is
> also freed.

Where is the memory freed?  A knote MUST NOT be freed in an f_event
handler.  The only location that a list element is allowed to be
freed is in knote_drop, which must happen after f_detach is called,
but that can't/won't happen from knote (I believe the timer handles
this specially, but we are talking about normal knlist type filters)..

The rest of your explination is invalid due to the invalid assumption
of this point...

If you can provide to me where the knote is free'd in knote, w/
function/line number stack trace (does not have to be dump, but a
sample call path), then I'll reconsider, and fix that bug...
> 5. That's why we have the use after free: SLIST_FOREACH is trying to get
> a pointer to a next element from the freed memory.
> 6. This is why the commit for trashing the freed memory made all the
> difference: previously the freed memory was unlikely to be re-used /
> modified, so the use-after-free had a high chance of succeeding.  It's a
> fact that in my panic there was an attempt to dereference a trashed pointer.
> 7. Finally, this is why SLIST_FOREACH_SAFE helps here: we stash the
> pointer to the next element beforehand and, thus, we do not access the
> freed memory.
> 
> Please let me know if you see any fault in above reasoning or if
> something is still no clear.

-- 
  John-Mark Gurney				Voice: +1 415 225 5579

     "All that I will do, has been done, All that I have, has not."
Received on Thu Aug 27 2015 - 16:09:52 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:59 UTC