Re: Segfault in _Unwind_* code called from pthread_exit

From: Tijl Coosemans <tijl_at_FreeBSD.org> Date: Fri, 25 Aug 2017 17:38:51 +0200 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:13 UTC

On Thu, 24 Aug 2017 18:08:30 +0200 Tijl Coosemans <tijl_at_FreeBSD.org> wrote:
> On Thu, 24 Aug 2017 18:42:35 +0300 Konstantin Belousov <kostikbel_at_gmail.com> wrote:
>> On Wed, Aug 23, 2017 at 04:37:07PM +0200, Tijl Coosemans wrote:  
>>> The following program segfaults for me on amd64 when linked like this:
>>> 
>>> cc -o test test.c -lpthread -L/usr/local/lib/gcc5 -lgcc_s -rpath /usr/local/lib/gcc5
>>> 
>>> --------------------------------
>>> #include <pthread.h>
>>> #include <stdio.h>
>>> 
>>> void *
>>> thr( void *arg ) {
>>> 	return( NULL );
>>> }
>>> 
>>> int
>>> main( void ) {
>>> 	pthread_t thread;
>>> 
>>> 	for( int i = 1; i < 20; i++ ) {
>>> 		fprintf( stderr, "%d\n", i );
>>> 		pthread_create( &thread, NULL, thr, NULL );
>>> 		pthread_join( thread, NULL );
>>> 	}
>>> 	return( 0 );
>>> }
>>> --------------------------------
>>> 
>>> The backtrace looks like this:
>>> 
>>> Thread 7 received signal SIGSEGV, Segmentation fault.
>>> [Switching to LWP 100511 of process 1886]
>>> uw_frame_state_for (context=context_at_entry=0x7fffdfffddc0, 
>>>     fs=fs_at_entry=0x7fffdfffdb10)
>>>     at /usr/ports/lang/gcc5/work/gcc-5.4.0/libgcc/unwind-dw2.c:1249
>>> 1249	/usr/ports/lang/gcc5/work/gcc-5.4.0/libgcc/unwind-dw2.c: No such file or directory.
>>> (gdb) bt
>>> #0  uw_frame_state_for (context=context_at_entry=0x7fffdfffddc0, 
>>>     fs=fs_at_entry=0x7fffdfffdb10)
>>>     at /usr/ports/lang/gcc5/work/gcc-5.4.0/libgcc/unwind-dw2.c:1249
>>> #1  0x0000000800a66ecb in _Unwind_ForcedUnwind_Phase2 (
>>>     exc=exc_at_entry=0x800658730, context=context_at_entry=0x7fffdfffddc0)
>>>     at /usr/ports/lang/gcc5/work/gcc-5.4.0/libgcc/unwind.inc:155
>>> #2  0x0000000800a67200 in _Unwind_ForcedUnwind (exc=0x800658730, 
>>>     stop=0x8008428b0 <thread_unwind_stop>, stop_argument=0x0)
>>>     at /usr/ports/lang/gcc5/work/gcc-5.4.0/libgcc/unwind.inc:207
>>> #3  0x0000000800842224 in _Unwind_ForcedUnwind (ex=0x800658730, 
>>>     stop_func=0x8008428b0 <thread_unwind_stop>, stop_arg=0x0)
>>>     at /usr/src/lib/libthr/thread/thr_exit.c:106
>>> #4  0x000000080084269f in thread_unwind ()
>>>     at /usr/src/lib/libthr/thread/thr_exit.c:172
>>> #5  0x00000008008424d6 in _pthread_exit_mask (status=0x0, mask=0x0)
>>>     at /usr/src/lib/libthr/thread/thr_exit.c:254
>>> #6  0x0000000800842359 in _pthread_exit (status=0x0)
>>>     at /usr/src/lib/libthr/thread/thr_exit.c:206
>>> #7  0x000000080082ccb1 in thread_start (curthread=0x800658500)
>>>     at /usr/src/lib/libthr/thread/thr_create.c:289
>>> #8  0x00007fffdfdfe000 in ?? ()
>>> Backtrace stopped: Cannot access memory at address 0x7fffdfffe000
>>> 
>>> 
>>> It happens with gcc6 as well, but not with base libgcc_s.
>>> Can anyone reproduce this?  Have there been any changes to stack
>>> unwinding recently (last few months)?    
>> 
>> I can reproduce this, and there was a change in gcc unwinder, it seems.
>> Below is a patch which I did not even compiled.  Still, it should give
>> an idea how it might be approached.  The patch is against gcc head.  
> 
> Currently I'm thinking to patch our cpu_set_upcall in vm_machdep.c to set
> the return address for the thread entry point to NULL (#8 in the backtrace
> above).  For new stacks this is implicitly NULL, but "Thread 7" (as gdb
> calls it) uses a recycled stack and libthr stores a 'struct stack' at the
> end of such stacks (to keep them in a linked list).  I'm still looking at
> how base libgcc_s which uses LLVM libunwind avoids this problem.

So both GCC and LLVM unwinding look up the return address in the CFI
table and fail when the return address is garbage, but LLVM treats this
as an end-of-stack condition while GCC further tries to see if the
return address points to a signal trampoline by testing the instruction
bytes at that address.  On amd64 the garbage address is unreadable so it
segfaults.  On i386 it is readable, the test fails and GCC returns
end-of-stack.

To fix the crash and get predictable behaviour in the other cases I
propose always setting the return address to 0.  The attached patch does
this for i386 and amd64.  I don't know if other architectures need a
similar patch.