Re: RFC: jemalloc: qdbus sigsegv in malloc_init

From: David Xu <listlog2011_at_gmail.com>
Date: Mon, 21 May 2012 11:28:09 +0800
On 2012/5/21 10:54, David Xu wrote:
> On 2012/5/21 1:24, Konstantin Belousov wrote:
>> On Sun, May 20, 2012 at 06:42:35PM +0200, Alberto Villa wrote:
>>> On Sun, May 20, 2012 at 8:03 AM, David Xu<listlog2011_at_gmail.com>  
>>> wrote:
>>>> qdbus segfaults on my machine too, I tracked it down, and found the 
>>>> problem
>>>> is in QT,
>>>> it deleted current_thread_data_key,  but it still uses it in some 
>>>> cxa hooks,
>>>>   I  applied the
>>>> following patch,  and it works fine.
>>> Thanks for the analysis David!
>>>
>>>> I think the bug depends on linking order in QT library ? if the
>>>> qthread_unix.cpp is linked
>>>> as lastest module, the key will be deleted after all cxa hooks run, 
>>>> then it
>>>> will be fine,
>>>> otherwise, it would crash.
>>> Is this really possible?
>> No, I do not think it is possible.
>>
>> The only possibility for something weird happen is for 
>> atexit/__cxa_atexit
>> functions to be registered from another atexit function, and then we
>> indeed could call the newly registered function too late.
>>
>> I wonder if the following hack makes any change in the observed 
>> behaviour.
>>
>> diff --git a/lib/libc/stdlib/atexit.c b/lib/libc/stdlib/atexit.c
>> index 511172a..bab850c 100644
>> --- a/lib/libc/stdlib/atexit.c
>> +++ b/lib/libc/stdlib/atexit.c
>> _at__at_ -72,6 +72,7 _at__at_ struct atexit {
>>   };
>>
>>   static struct atexit *__atexit;        /* points to head of LIFO 
>> stack */
>> +static int atexit_gen;
>>
>>   /*
>>    * Register the function described by 'fptr' to be called at 
>> application
>> _at__at_ -107,6 +108,7 _at__at_ atexit_register(struct atexit_fn *fptr)
>>           __atexit = p;
>>       }
>>       p->fns[p->ind++] = *fptr;
>> +    atexit_gen++;
>>       _MUTEX_UNLOCK(&atexit_mutex);
>>       return 0;
>>   }
>> _at__at_ -162,7 +164,7 _at__at_ __cxa_finalize(void *dso)
>>       struct dl_phdr_info phdr_info;
>>       struct atexit *p;
>>       struct atexit_fn fn;
>> -    int n, has_phdr;
>> +    int atexit_gen_prev, n, has_phdr;
>>
>>       if (dso != NULL)
>>           has_phdr = _rtld_addr_phdr(dso,&phdr_info);
>> _at__at_ -170,6 +172,8 _at__at_ __cxa_finalize(void *dso)
>>           has_phdr = 0;
>>
>>       _MUTEX_LOCK(&atexit_mutex);
>> +retry:
>> +    atexit_gen_prev = atexit_gen;
>>       for (p = __atexit; p; p = p->next) {
>>           for (n = p->ind; --n>= 0;) {
>>               if (p->fns[n].fn_type == ATEXIT_FN_EMPTY)
>> _at__at_ -196,6 +200,8 _at__at_ __cxa_finalize(void *dso)
>>               _MUTEX_LOCK(&atexit_mutex);
>>           }
>>       }
>> +    if (atexit_gen_prev != atexit_gen)
>> +        goto retry;
>>       _MUTEX_UNLOCK(&atexit_mutex);
>>       if (dso == NULL)
>>           _MUTEX_DESTROY(&atexit_mutex);
> I have tried your patch,  it does not fix the problem. As I said, it 
> is a bug in QT,
> the bug is pthread key current_thread_data_key is deleted by a global 
> C++ object
> too early, other C++ global objects still need this pthread key. The 
> following procedure
> shows how I found the problem:
>
> davidxu_at_xyf:~%gdb qdbus
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and 
> you are
> welcome to change it and/or distribute copies of it under certain 
> conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for 
> details.
> This GDB was configured as "i386-marcel-freebsd"...(no debugging 
> symbols found)...
> (gdb) break __cxa_finalize
> Function "__cxa_finalize" not defined.
> Make breakpoint pending on future shared library load? (y or [n]) y
> Breakpoint 1 (__cxa_finalize) pending.
> (gdb) run
> Starting program: /usr/local/bin/qdbus
> (no debugging symbols found)...(no debugging symbols found)...(no 
> debugging symbols found)...(no debugging symbols found)...(no 
> debugging symbols found)...(no debugging symbols found)...(no 
> debugging symbols found)...[New LWP 100077]
> (no debugging symbols found)...(no debugging symbols found)...(no 
> debugging symbols found)...(no debugging symbols found)...(no 
> debugging symbols found)...(no debugging symbols found)...(no 
> debugging symbols found)...(no debugging symbols found)...Breakpoint 2 
> at 0x2864ac26
> Pending breakpoint "__cxa_finalize" resolved
> (no debugging symbols found)...[New Thread 29007300 (LWP 100077/qdbus)]
> (no debugging symbols found)...:1.0
>  org.gnome.SessionManager
> :1.11
> :1.111
> :1.12
> :1.13
>  org.gtk.vfs.Daemon
> :1.143
> :1.15
>  org.pulseaudio.Server
> :1.17
>  org.gnome.Panel
> :1.18
> :1.19
> :1.20
>  org.gtk.Private.HalVolumeMonitor
> :1.21
>  org.gtk.Private.GPhoto2VolumeMonitor
> :1.22
> :1.24
>  org.gnome.ScreenSaver
> :1.25
> :1.27
> :1.28
> :1.29
> :1.30
> :1.31
>  org.gnome.panel.applet.WnckletFactory
> :1.32
> :1.33
> :1.34
> :1.35
>  org.gnome.panel.applet.CPUFreqAppletFactory
> :1.36
>  org.gnome.panel.applet.NotificationAreaAppletFactory
> :1.37
>  org.gnome.panel.applet.MultiLoadAppletFactory
> :1.38
> :1.39
> :1.4
>  org.gnome.GConf
> :1.41
>  org.gnome.panel.applet.ClockAppletFactory
> :1.49
> :1.5
>  org.gnome.SettingsDaemon
> :1.50
> :1.53
> :1.64
> :1.7
>  org.freedesktop.secrets
>  org.gnome.keyring
> :1.75
>  org.gtk.vfs.Metadata
> :1.76
>  org.gnome.Terminal.Display_0_0
> :1.77
> org.freedesktop.DBus
> [Switching to Thread 29007300 (LWP 100077/qdbus)]
>
> Breakpoint 2, 0x2864ac26 in __cxa_finalize () from /lib/libc.so.7
> (gdb) print current_thread_data_key
> $1 = 0
> (gdb) thread tsd
> Key 0, destructor 0x281d77f0 <_Z27destroy_current_thread_dataPv>
> Key 1, destructor 0x28732dc0 <g_thread_create_full>
> Key 2, destructor 0x28726a00 <g_slice_get_config>
> Key 3, destructor 0x0 <???>
>
>
> Here you can find that the function destroy_current_thread_data()  is 
> registered.
>
>
> (gdb) bt
> #0  0x2864ac26 in __cxa_finalize () from /lib/libc.so.7
> #1  0x285efe1a in exit () from /lib/libc.so.7
> #2  0x08051db5 in main ()
> (gdb) break QThreadData::current()
> Breakpoint 3 at 0x281d7856
> (gdb) info breakpoints
> Num Type           Disp Enb Address    What
> 2   breakpoint     keep y   0x2864ac26 <__cxa_finalize+6>
>     breakpoint already hit 1 time
> 3   breakpoint     keep y   0x281d7856 <QThreadData::current()+6>
> (gdb) delete 2
> (gdb) cont
> Continuing.
>
> Breakpoint 3, 0x281d7856 in QThreadData::current () from 
> /usr/local/lib/qt4/libQtCore.so.4
> (gdb) bt
> #0  0x281d7856 in QThreadData::current () from 
> /usr/local/lib/qt4/libQtCore.so.4
> #1  0x281d4747 in QThread::currentThread () from 
> /usr/local/lib/qt4/libQtCore.so.4
> #2  0x28097248 in QDBusConnectionPrivate::deleteYourself () from 
> /usr/local/lib/qt4/libQtDBus.so.4
> #3  0x2808f2ea in QDBusConnection::~QDBusConnection () from 
> /usr/local/lib/qt4/libQtDBus.so.4
> #4  0x2864ad8f in __cxa_finalize () from /lib/libc.so.7
> #5  0x285efe1a in exit () from /lib/libc.so.7
> #6  0x08051db5 in main ()
> (gdb) thread tsd
> Key 1, destructor 0x0 <???>
> Key 2, destructor 0x0 <???>
> Key 3, destructor 0x0 <???>
>
> Here you can see the destroy_current_thread_data() was executed, and 
> unregistered.
> the key current_thread_key_data which is index 0 is deleted.
>
>
> (gdb) cont
> Continuing.
>
> Breakpoint 3, 0x281d7856 in QThreadData::current () from 
> /usr/local/lib/qt4/libQtCore.so.4
> (gdb) bt
> #0  0x281d7856 in QThreadData::current () from 
> /usr/local/lib/qt4/libQtCore.so.4
> #1  0x282f58e3 in QObject::QObject () from 
> /usr/local/lib/qt4/libQtCore.so.4
> #2  0x281d4710 in QThread::QThread () from 
> /usr/local/lib/qt4/libQtCore.so.4
> #3  0x281d5a9e in QAdoptedThread::QAdoptedThread () from 
> /usr/local/lib/qt4/libQtCore.so.4
> #4  0x281d7934 in QThreadData::current () from 
> /usr/local/lib/qt4/libQtCore.so.4
> #5  0x281d4747 in QThread::currentThread () from 
> /usr/local/lib/qt4/libQtCore.so.4
> #6  0x28097248 in QDBusConnectionPrivate::deleteYourself () from 
> /usr/local/lib/qt4/libQtDBus.so.4
> #7  0x2808f2ea in QDBusConnection::~QDBusConnection () from 
> /usr/local/lib/qt4/libQtDBus.so.4
> #8  0x2864ad8f in __cxa_finalize () from /lib/libc.so.7
> #9  0x285efe1a in exit () from /lib/libc.so.7
> #10 0x08051db5 in main ()
>
> now the stupid code starts to create a new thread...
>
> (gdb) cont
> Continuing.
>
> Breakpoint 3, 0x281d7856 in QThreadData::current () from 
> /usr/local/lib/qt4/libQtCore.so.4
> (gdb) bt
> #0  0x281d7856 in QThreadData::current () from 
> /usr/local/lib/qt4/libQtCore.so.4
> #1  0x282f58e3 in QObject::QObject () from 
> /usr/local/lib/qt4/libQtCore.so.4
> #2  0x281d4710 in QThread::QThread () from 
> /usr/local/lib/qt4/libQtCore.so.4
> #3  0x281d5a9e in QAdoptedThread::QAdoptedThread () from 
> /usr/local/lib/qt4/libQtCore.so.4
> #4  0x281d7934 in QThreadData::current () from 
> /usr/local/lib/qt4/libQtCore.so.4
> #5  0x282f58e3 in QObject::QObject () from 
> /usr/local/lib/qt4/libQtCore.so.4
> #6  0x281d4710 in QThread::QThread () from 
> /usr/local/lib/qt4/libQtCore.so.4
> #7  0x281d5a9e in QAdoptedThread::QAdoptedThread () from 
> /usr/local/lib/qt4/libQtCore.so.4
> #8  0x281d7934 in QThreadData::current () from 
> /usr/local/lib/qt4/libQtCore.so.4
> #9  0x281d4747 in QThread::currentThread () from 
> /usr/local/lib/qt4/libQtCore.so.4
> #10 0x28097248 in QDBusConnectionPrivate::deleteYourself () from 
> /usr/local/lib/qt4/libQtDBus.so.4
> #11 0x2808f2ea in QDBusConnection::~QDBusConnection () from 
> /usr/local/lib/qt4/libQtDBus.so.4
> #12 0x2864ad8f in __cxa_finalize () from /lib/libc.so.7
> #13 0x285efe1a in exit () from /lib/libc.so.7
> #14 0x08051db5 in main ()
> (gdb)
> #0  0x281d7856 in QThreadData::current () from 
> /usr/local/lib/qt4/libQtCore.so.4
> #1  0x282f58e3 in QObject::QObject () from 
> /usr/local/lib/qt4/libQtCore.so.4
> #2  0x281d4710 in QThread::QThread () from 
> /usr/local/lib/qt4/libQtCore.so.4
> #3  0x281d5a9e in QAdoptedThread::QAdoptedThread () from 
> /usr/local/lib/qt4/libQtCore.so.4
> #4  0x281d7934 in QThreadData::current () from 
> /usr/local/lib/qt4/libQtCore.so.4
> #5  0x282f58e3 in QObject::QObject () from 
> /usr/local/lib/qt4/libQtCore.so.4
> #6  0x281d4710 in QThread::QThread () from 
> /usr/local/lib/qt4/libQtCore.so.4
> #7  0x281d5a9e in QAdoptedThread::QAdoptedThread () from 
> /usr/local/lib/qt4/libQtCore.so.4
> #8  0x281d7934 in QThreadData::current () from 
> /usr/local/lib/qt4/libQtCore.so.4
> #9  0x281d4747 in QThread::currentThread () from 
> /usr/local/lib/qt4/libQtCore.so.4
> #10 0x28097248 in QDBusConnectionPrivate::deleteYourself () from 
> /usr/local/lib/qt4/libQtDBus.so.4
> #11 0x2808f2ea in QDBusConnection::~QDBusConnection () from 
> /usr/local/lib/qt4/libQtDBus.so.4
> #12 0x2864ad8f in __cxa_finalize () from /lib/libc.so.7
> #13 0x285efe1a in exit () from /lib/libc.so.7
> #14 0x08051db5 in main ()
> (gdb)
>
> dead-loop in QT library until the stack overflow.
>
> As I said, it depends on ordering the global objects are destructed, 
> if the object which deleting
> the current_thread_data_key is destructed lastly, the problem wont 
> happen, but now
> it is destructed too early. I believe there is no specification said 
> that which C++ object should be
> destructed first if they are in different compiled module and then are 
> linked together to generated
> a shared object, .so file.
>
>
Now let me dig into qthread_unix.cpp, see how QThreadData::current() works:

QThreadData *QThreadData::current()
{
     QThreadData *data = get_thread_data();
     if (!data) {
         void *a;
         if (QInternal::activateCallbacks(QInternal::AdoptCurrentThread, 
&a)) {
             QThread *adopted = static_cast<QThread*>(a);
             Q_ASSERT(adopted);
             data = QThreadData::get2(adopted);
             set_thread_data(data);
             adopted->d_func()->running = true;
             adopted->d_func()->finished = false;
             static_cast<QAdoptedThread *>(adopted)->init();
         } else {
             data = new QThreadData;
             QT_TRY {
                 set_thread_data(data);
                 data->thread = new QAdoptedThread(data);
             } QT_CATCH(...) {
                 clear_thread_data();
                 data->deref();
                 data = 0;
                 QT_RETHROW;
             }
             data->deref();
         }
         if (!QCoreApplicationPrivate::theMainThread)
             QCoreApplicationPrivate::theMainThread = data->thread;
     }
     return data;
}

it calls get_thread_data(), if it returns NULL, it create a new thread, 
and try to
set the new thread as "current thread data", it calls set_thread_data().

let's see how get_thread_data() and set_thread_data() work :

static QThreadData *get_thread_data()
{
#ifdef Q_OS_SYMBIAN
     return reinterpret_cast<QThreadData *>(Dll::Tls());
#else
     pthread_once(&current_thread_data_once, 
create_current_thread_data_key);
     return reinterpret_cast<QThreadData 
*>(pthread_getspecific(current_thread_data_key));
#endif
}

static void set_thread_data(QThreadData *data)
{
#ifdef Q_OS_SYMBIAN
     qt_symbian_throwIfError(Dll::SetTls(data));
#endif
     pthread_once(&current_thread_data_once, 
create_current_thread_data_key);
     pthread_setspecific(current_thread_data_key, data);
}


They just use pthread_getspecific and pthread_setspecific, the 
current_thread_data_key was only
created once which is guarded by pthread_once(), but as you know, the 
key has already
been deleted by Q_DESTRUCTOR_FUNCTION(destroy_current_thread_data_key) 
which is a global
object which has been destructed early, the key is no longer recreated, 
it is a stale key.

So pthread_setspecific should fail, and pthread_getspecific would return 
NULL, this causes
QThreadData::current()  create a new thread, it seems QAdoptedThread 
also calls QThreadData::current(),
so it is a recursion,  the recursion is never ended until stack overflow.
Received on Mon May 21 2012 - 01:28:15 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:27 UTC