On 29.10.17 20:13, Konstantin Belousov wrote: > On Sun, Oct 29, 2017 at 06:23:51PM +0100, Tijl Coosemans wrote: >> On Sat, 26 Aug 2017 21:40:34 +0300 Konstantin Belousov <kostikbel_at_gmail.com> wrote: >>> On Sat, Aug 26, 2017 at 08:28:13PM +0200, Tijl Coosemans wrote: >>>> I did consider using >>>> a CFI directive (see patch below) and it works, but it's architecture >>>> specific and it's inserted after the function prologue so there's still >>>> a window of a few instructions where a stack unwinder will try to use >>>> the return address. >>>> >>>> Index: lib/libthr/thread/thr_create.c >>>> =================================================================== >>>> --- lib/libthr/thread/thr_create.c (revision 322802) >>>> +++ lib/libthr/thread/thr_create.c (working copy) >>>> _at__at_ -251,6 +251,7 _at__at_ create_stack(struct pthread_attr *pattr) >>>> static void >>>> thread_start(struct pthread *curthread) >>>> { >>>> + __asm(".cfi_undefined %rip"); >>>> sigset_t set; >>>> >>>> if (curthread->attr.suspend == THR_CREATE_SUSPENDED) >>> >>> I like this approach much more than the previous patch. What can be >>> done is to provide asm trampoline which calls thread_start(). There you >>> can add the .cfi_undefined right at the entry. >>> >>> It is somewhat more work than just setting the return address on the >>> kernel-constructed pseudo stack frame, but I believe this is ultimately >>> correct way. You still can do it only on some arches, if you do not >>> have incentive to code asm for all of them. >> >> Ok, but then there are two ways to implement the trampoline: >> >> 1) >> movq $0,(%rsp) >> jmp thread_start >> 2) >> subq $8,%rsp >> call thread_start >> /* NOTREACHED */ >> >> With 1) you're setting the return address to zero anyway, so you might >> as well do that in the kernel like my first patch. With 2) you're >> setting up a new call frame, basically redoing what the kernel already >> did and on i386 this also means copying the function argument. > I do not quite understand the second variant, because the stack is not > guaranteed to be zeroed, and it is often not if reused after the previously > exited thread. > > The first variant is what I like, but perhaps we need to emulate the > frame as well, i.e. push two zero longs. > > Currently kernel does not access the usermode stack for the new thread > unless dictated by ABI (i.e. it does not touch it for 64bit process > on amd64, but have to for 32bit). I like this property. Also, the > previous paragraph is indicative: we do not really know in kernel > what ABI the userspace follows. It might want frame, may be it does > not need it. It could use other register than %rbp as the frame base, > etc. > >> >> Do you have any preference (or better alternatives), because I think I >> still prefer my first patch. It's the caller's job to set up the call >> frame, in this case the kernel. And if the kernel handles it then it >> also works with (hypothetical) implementations other than libthr. >> >>> Also crt1 probably should get the same treatment, despite we already set >>> %rbp to zero AFAIR. >> >> I haven't checked but I imagine the return address of the process entry >> point is always zero because the stack is all zeros. > Stack is not zero. The environment and argument strings and auxv are copied > at top, and at the bottom the ps_strings structure is located, so it > is not. > > If you commit your existing patch as is, I will not resent. But I do think > that stuff that can be done in usermode, should be done in usermode, esp. > when the amount of efforts is same. Attached what I have for libgcc. It can be applied to gcc5-8, should give no issues. The mentioned tc from this thread and mine, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82635 do pass. What do you think? Andreas
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:13 UTC