Re: Segfault in _Unwind_* code called from pthread_exit

From: Konstantin Belousov <kostikbel_at_gmail.com> Date: Sun, 29 Oct 2017 21:13:58 +0200 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:13 UTC

On Sun, Oct 29, 2017 at 06:23:51PM +0100, Tijl Coosemans wrote:
> On Sat, 26 Aug 2017 21:40:34 +0300 Konstantin Belousov <kostikbel_at_gmail.com> wrote:
> > On Sat, Aug 26, 2017 at 08:28:13PM +0200, Tijl Coosemans wrote:
> >> I did consider using
> >> a CFI directive (see patch below) and it works, but it's architecture
> >> specific and it's inserted after the function prologue so there's still
> >> a window of a few instructions where a stack unwinder will try to use
> >> the return address.
> >> 
> >> Index: lib/libthr/thread/thr_create.c
> >> ===================================================================
> >> --- lib/libthr/thread/thr_create.c      (revision 322802)
> >> +++ lib/libthr/thread/thr_create.c      (working copy)
> >> _at__at_ -251,6 +251,7 _at__at_ create_stack(struct pthread_attr *pattr)
> >>  static void
> >>  thread_start(struct pthread *curthread)
> >>  {
> >> +       __asm(".cfi_undefined %rip");
> >>         sigset_t set;
> >>  
> >>         if (curthread->attr.suspend == THR_CREATE_SUSPENDED)  
> > 
> > I like this approach much more than the previous patch.  What can be
> > done is to provide asm trampoline which calls thread_start().  There you
> > can add the .cfi_undefined right at the entry.
> >
> > It is somewhat more work than just setting the return address on the
> > kernel-constructed pseudo stack frame, but I believe this is ultimately
> > correct way.  You still can do it only on some arches, if you do not
> > have incentive to code asm for all of them.
> 
> Ok, but then there are two ways to implement the trampoline:
> 
> 1)
> 	movq $0,(%rsp)
> 	jmp thread_start
> 2)
> 	subq $8,%rsp
> 	call thread_start
> 	/* NOTREACHED */
> 
> With 1) you're setting the return address to zero anyway, so you might
> as well do that in the kernel like my first patch.  With 2) you're
> setting up a new call frame, basically redoing what the kernel already
> did and on i386 this also means copying the function argument.
I do not quite understand the second variant, because the stack is not
guaranteed to be zeroed, and it is often not if reused after the previously
exited thread.

The first variant is what I like, but perhaps we need to emulate the
frame as well, i.e. push two zero longs.

Currently kernel does not access the usermode stack for the new thread
unless dictated by ABI (i.e. it does not touch it for 64bit process
on amd64, but have to for 32bit).  I like this property.  Also, the
previous paragraph is indicative: we do not really know in kernel
what ABI the userspace follows.  It might want frame, may be it does
not need it.  It could use other register than %rbp as the frame base,
etc.

> 
> Do you have any preference (or better alternatives), because I think I
> still prefer my first patch.  It's the caller's job to set up the call
> frame, in this case the kernel.  And if the kernel handles it then it
> also works with (hypothetical) implementations other than libthr.
> 
> > Also crt1 probably should get the same treatment, despite we already set
> > %rbp to zero AFAIR.
> 
> I haven't checked but I imagine the return address of the process entry
> point is always zero because the stack is all zeros.
Stack is not zero. The environment and argument strings and auxv are copied
at top, and at the bottom the ps_strings structure is located, so it
is not.

If you commit your existing patch as is, I will not resent.  But I do think
that stuff that can be done in usermode, should be done in usermode, esp.
when the amount of efforts is same.