Re: Segfault in _Unwind_* code called from pthread_exit

From: Andreas Tobler <andreast-list_at_fgznet.ch> Date: Sun, 29 Oct 2017 20:40:46 +0100 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:13 UTC

On 29.10.17 20:13, Konstantin Belousov wrote:
> On Sun, Oct 29, 2017 at 06:23:51PM +0100, Tijl Coosemans wrote:
>> On Sat, 26 Aug 2017 21:40:34 +0300 Konstantin Belousov <kostikbel_at_gmail.com> wrote:
>>> On Sat, Aug 26, 2017 at 08:28:13PM +0200, Tijl Coosemans wrote:
>>>> I did consider using
>>>> a CFI directive (see patch below) and it works, but it's architecture
>>>> specific and it's inserted after the function prologue so there's still
>>>> a window of a few instructions where a stack unwinder will try to use
>>>> the return address.
>>>>
>>>> Index: lib/libthr/thread/thr_create.c
>>>> ===================================================================
>>>> --- lib/libthr/thread/thr_create.c      (revision 322802)
>>>> +++ lib/libthr/thread/thr_create.c      (working copy)
>>>> _at__at_ -251,6 +251,7 _at__at_ create_stack(struct pthread_attr *pattr)
>>>>   static void
>>>>   thread_start(struct pthread *curthread)
>>>>   {
>>>> +       __asm(".cfi_undefined %rip");
>>>>          sigset_t set;
>>>>   
>>>>          if (curthread->attr.suspend == THR_CREATE_SUSPENDED)
>>>
>>> I like this approach much more than the previous patch.  What can be
>>> done is to provide asm trampoline which calls thread_start().  There you
>>> can add the .cfi_undefined right at the entry.
>>>
>>> It is somewhat more work than just setting the return address on the
>>> kernel-constructed pseudo stack frame, but I believe this is ultimately
>>> correct way.  You still can do it only on some arches, if you do not
>>> have incentive to code asm for all of them.
>>
>> Ok, but then there are two ways to implement the trampoline:
>>
>> 1)
>> 	movq $0,(%rsp)
>> 	jmp thread_start
>> 2)
>> 	subq $8,%rsp
>> 	call thread_start
>> 	/* NOTREACHED */
>>
>> With 1) you're setting the return address to zero anyway, so you might
>> as well do that in the kernel like my first patch.  With 2) you're
>> setting up a new call frame, basically redoing what the kernel already
>> did and on i386 this also means copying the function argument.
> I do not quite understand the second variant, because the stack is not
> guaranteed to be zeroed, and it is often not if reused after the previously
> exited thread.
> 
> The first variant is what I like, but perhaps we need to emulate the
> frame as well, i.e. push two zero longs.
> 
> Currently kernel does not access the usermode stack for the new thread
> unless dictated by ABI (i.e. it does not touch it for 64bit process
> on amd64, but have to for 32bit).  I like this property.  Also, the
> previous paragraph is indicative: we do not really know in kernel
> what ABI the userspace follows.  It might want frame, may be it does
> not need it.  It could use other register than %rbp as the frame base,
> etc.
> 
>>
>> Do you have any preference (or better alternatives), because I think I
>> still prefer my first patch.  It's the caller's job to set up the call
>> frame, in this case the kernel.  And if the kernel handles it then it
>> also works with (hypothetical) implementations other than libthr.
>>
>>> Also crt1 probably should get the same treatment, despite we already set
>>> %rbp to zero AFAIR.
>>
>> I haven't checked but I imagine the return address of the process entry
>> point is always zero because the stack is all zeros.
> Stack is not zero. The environment and argument strings and auxv are copied
> at top, and at the bottom the ps_strings structure is located, so it
> is not.
> 
> If you commit your existing patch as is, I will not resent.  But I do think
> that stuff that can be done in usermode, should be done in usermode, esp.
> when the amount of efforts is same.

Attached what I have for libgcc. It can be applied to gcc5-8, should 
give no issues. The mentioned tc from this thread and mine, 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82635 do pass.

What do you think?

Andreas