Re: RFC: jemalloc: qdbus sigsegv in malloc_init

From: Konstantin Belousov <kostikbel_at_gmail.com>
Date: Mon, 21 May 2012 12:38:36 +0300
On Mon, May 21, 2012 at 10:54:54AM +0800, David Xu wrote:
> On 2012/5/21 1:24, Konstantin Belousov wrote:
> >On Sun, May 20, 2012 at 06:42:35PM +0200, Alberto Villa wrote:
> >>On Sun, May 20, 2012 at 8:03 AM, David Xu<listlog2011_at_gmail.com>  wrote:
> >>>qdbus segfaults on my machine too, I tracked it down, and found the 
> >>>problem
> >>>is in QT,
> >>>it deleted current_thread_data_key,  but it still uses it in some cxa 
> >>>hooks,
> >>>  I  applied the
> >>>following patch,  and it works fine.
> >>Thanks for the analysis David!
> >>
> >>>I think the bug depends on linking order in QT library ? if the
> >>>qthread_unix.cpp is linked
> >>>as lastest module, the key will be deleted after all cxa hooks run, then 
> >>>it
> >>>will be fine,
> >>>otherwise, it would crash.
> >>Is this really possible?
> >No, I do not think it is possible.
> >
> >The only possibility for something weird happen is for atexit/__cxa_atexit
> >functions to be registered from another atexit function, and then we
> >indeed could call the newly registered function too late.
> >
> >I wonder if the following hack makes any change in the observed behaviour.
> >
> >diff --git a/lib/libc/stdlib/atexit.c b/lib/libc/stdlib/atexit.c
> >index 511172a..bab850c 100644
> >--- a/lib/libc/stdlib/atexit.c
> >+++ b/lib/libc/stdlib/atexit.c
> >_at__at_ -72,6 +72,7 _at__at_ struct atexit {
> >  };
> >
> >  static struct atexit *__atexit;		/* points to head of LIFO 
> >  stack */
> >+static int atexit_gen;
> >
> >  /*
> >   * Register the function described by 'fptr' to be called at application
> >_at__at_ -107,6 +108,7 _at__at_ atexit_register(struct atexit_fn *fptr)
> >  		__atexit = p;
> >  	}
> >  	p->fns[p->ind++] = *fptr;
> >+	atexit_gen++;
> >  	_MUTEX_UNLOCK(&atexit_mutex);
> >  	return 0;
> >  }
> >_at__at_ -162,7 +164,7 _at__at_ __cxa_finalize(void *dso)
> >  	struct dl_phdr_info phdr_info;
> >  	struct atexit *p;
> >  	struct atexit_fn fn;
> >-	int n, has_phdr;
> >+	int atexit_gen_prev, n, has_phdr;
> >
> >  	if (dso != NULL)
> >  		has_phdr = _rtld_addr_phdr(dso,&phdr_info);
> >_at__at_ -170,6 +172,8 _at__at_ __cxa_finalize(void *dso)
> >  		has_phdr = 0;
> >
> >  	_MUTEX_LOCK(&atexit_mutex);
> >+retry:
> >+	atexit_gen_prev = atexit_gen;
> >  	for (p = __atexit; p; p = p->next) {
> >  		for (n = p->ind; --n>= 0;) {
> >  			if (p->fns[n].fn_type == ATEXIT_FN_EMPTY)
> >_at__at_ -196,6 +200,8 _at__at_ __cxa_finalize(void *dso)
> >  			_MUTEX_LOCK(&atexit_mutex);
> >  		}
> >  	}
> >+	if (atexit_gen_prev != atexit_gen)
> >+		goto retry;
> >  	_MUTEX_UNLOCK(&atexit_mutex);
> >  	if (dso == NULL)
> >  		_MUTEX_DESTROY(&atexit_mutex);
> I have tried your patch,  it does not fix the problem. As I said, it is 
> a bug in QT,
> the bug is pthread key current_thread_data_key is deleted by a global 
> C++ object
> too early, other C++ global objects still need this pthread key. The 
> following procedure
> shows how I found the problem:
> 
> davidxu_at_xyf:~%gdb qdbus
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain 
> conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "i386-marcel-freebsd"...(no debugging symbols 
> found)...
> (gdb) break __cxa_finalize
> Function "__cxa_finalize" not defined.
> Make breakpoint pending on future shared library load? (y or [n]) y
> Breakpoint 1 (__cxa_finalize) pending.
> (gdb) run
> Starting program: /usr/local/bin/qdbus
> (no debugging symbols found)...(no debugging symbols found)...(no 
> debugging symbols found)...(no debugging symbols found)...(no debugging 
> symbols found)...(no debugging symbols found)...(no debugging symbols 
> found)...[New LWP 100077]
> (no debugging symbols found)...(no debugging symbols found)...(no 
> debugging symbols found)...(no debugging symbols found)...(no debugging 
> symbols found)...(no debugging symbols found)...(no debugging symbols 
> found)...(no debugging symbols found)...Breakpoint 2 at 0x2864ac26
> Pending breakpoint "__cxa_finalize" resolved
> (no debugging symbols found)...[New Thread 29007300 (LWP 100077/qdbus)]
> (no debugging symbols found)...:1.0
>  org.gnome.SessionManager
> :1.11
> :1.111
> :1.12
> :1.13
>  org.gtk.vfs.Daemon
> :1.143
> :1.15
>  org.pulseaudio.Server
> :1.17
>  org.gnome.Panel
> :1.18
> :1.19
> :1.20
>  org.gtk.Private.HalVolumeMonitor
> :1.21
>  org.gtk.Private.GPhoto2VolumeMonitor
> :1.22
> :1.24
>  org.gnome.ScreenSaver
> :1.25
> :1.27
> :1.28
> :1.29
> :1.30
> :1.31
>  org.gnome.panel.applet.WnckletFactory
> :1.32
> :1.33
> :1.34
> :1.35
>  org.gnome.panel.applet.CPUFreqAppletFactory
> :1.36
>  org.gnome.panel.applet.NotificationAreaAppletFactory
> :1.37
>  org.gnome.panel.applet.MultiLoadAppletFactory
> :1.38
> :1.39
> :1.4
>  org.gnome.GConf
> :1.41
>  org.gnome.panel.applet.ClockAppletFactory
> :1.49
> :1.5
>  org.gnome.SettingsDaemon
> :1.50
> :1.53
> :1.64
> :1.7
>  org.freedesktop.secrets
>  org.gnome.keyring
> :1.75
>  org.gtk.vfs.Metadata
> :1.76
>  org.gnome.Terminal.Display_0_0
> :1.77
> org.freedesktop.DBus
> [Switching to Thread 29007300 (LWP 100077/qdbus)]
> 
> Breakpoint 2, 0x2864ac26 in __cxa_finalize () from /lib/libc.so.7
> (gdb) print current_thread_data_key
> $1 = 0
> (gdb) thread tsd
> Key 0, destructor 0x281d77f0 <_Z27destroy_current_thread_dataPv>
> Key 1, destructor 0x28732dc0 <g_thread_create_full>
> Key 2, destructor 0x28726a00 <g_slice_get_config>
> Key 3, destructor 0x0 <???>
> 
> 
> Here you can find that the function destroy_current_thread_data()  is 
> registered.
> 
> 
> (gdb) bt
> #0  0x2864ac26 in __cxa_finalize () from /lib/libc.so.7
> #1  0x285efe1a in exit () from /lib/libc.so.7
> #2  0x08051db5 in main ()
> (gdb) break QThreadData::current()
> Breakpoint 3 at 0x281d7856
> (gdb) info breakpoints
> Num Type           Disp Enb Address    What
> 2   breakpoint     keep y   0x2864ac26 <__cxa_finalize+6>
>     breakpoint already hit 1 time
> 3   breakpoint     keep y   0x281d7856 <QThreadData::current()+6>
> (gdb) delete 2
> (gdb) cont
> Continuing.
> 
> Breakpoint 3, 0x281d7856 in QThreadData::current () from 
> /usr/local/lib/qt4/libQtCore.so.4
> (gdb) bt
> #0  0x281d7856 in QThreadData::current () from 
> /usr/local/lib/qt4/libQtCore.so.4
> #1  0x281d4747 in QThread::currentThread () from 
> /usr/local/lib/qt4/libQtCore.so.4
> #2  0x28097248 in QDBusConnectionPrivate::deleteYourself () from 
> /usr/local/lib/qt4/libQtDBus.so.4
> #3  0x2808f2ea in QDBusConnection::~QDBusConnection () from 
> /usr/local/lib/qt4/libQtDBus.so.4
> #4  0x2864ad8f in __cxa_finalize () from /lib/libc.so.7
> #5  0x285efe1a in exit () from /lib/libc.so.7
> #6  0x08051db5 in main ()
> (gdb) thread tsd
> Key 1, destructor 0x0 <???>
> Key 2, destructor 0x0 <???>
> Key 3, destructor 0x0 <???>
> 
> Here you can see the destroy_current_thread_data() was executed, and 
> unregistered.
> the key current_thread_key_data which is index 0 is deleted.
> 
> 
> (gdb) cont
> Continuing.
> 
> Breakpoint 3, 0x281d7856 in QThreadData::current () from 
> /usr/local/lib/qt4/libQtCore.so.4
> (gdb) bt
> #0  0x281d7856 in QThreadData::current () from 
> /usr/local/lib/qt4/libQtCore.so.4
> #1  0x282f58e3 in QObject::QObject () from /usr/local/lib/qt4/libQtCore.so.4
> #2  0x281d4710 in QThread::QThread () from /usr/local/lib/qt4/libQtCore.so.4
> #3  0x281d5a9e in QAdoptedThread::QAdoptedThread () from 
> /usr/local/lib/qt4/libQtCore.so.4
> #4  0x281d7934 in QThreadData::current () from 
> /usr/local/lib/qt4/libQtCore.so.4
> #5  0x281d4747 in QThread::currentThread () from 
> /usr/local/lib/qt4/libQtCore.so.4
> #6  0x28097248 in QDBusConnectionPrivate::deleteYourself () from 
> /usr/local/lib/qt4/libQtDBus.so.4
> #7  0x2808f2ea in QDBusConnection::~QDBusConnection () from 
> /usr/local/lib/qt4/libQtDBus.so.4
> #8  0x2864ad8f in __cxa_finalize () from /lib/libc.so.7
> #9  0x285efe1a in exit () from /lib/libc.so.7
> #10 0x08051db5 in main ()
> 
> now the stupid code starts to create a new thread...
> 
> (gdb) cont
> Continuing.
> 
> Breakpoint 3, 0x281d7856 in QThreadData::current () from 
> /usr/local/lib/qt4/libQtCore.so.4
> (gdb) bt
> #0  0x281d7856 in QThreadData::current () from 
> /usr/local/lib/qt4/libQtCore.so.4
> #1  0x282f58e3 in QObject::QObject () from /usr/local/lib/qt4/libQtCore.so.4
> #2  0x281d4710 in QThread::QThread () from /usr/local/lib/qt4/libQtCore.so.4
> #3  0x281d5a9e in QAdoptedThread::QAdoptedThread () from 
> /usr/local/lib/qt4/libQtCore.so.4
> #4  0x281d7934 in QThreadData::current () from 
> /usr/local/lib/qt4/libQtCore.so.4
> #5  0x282f58e3 in QObject::QObject () from /usr/local/lib/qt4/libQtCore.so.4
> #6  0x281d4710 in QThread::QThread () from /usr/local/lib/qt4/libQtCore.so.4
> #7  0x281d5a9e in QAdoptedThread::QAdoptedThread () from 
> /usr/local/lib/qt4/libQtCore.so.4
> #8  0x281d7934 in QThreadData::current () from 
> /usr/local/lib/qt4/libQtCore.so.4
> #9  0x281d4747 in QThread::currentThread () from 
> /usr/local/lib/qt4/libQtCore.so.4
> #10 0x28097248 in QDBusConnectionPrivate::deleteYourself () from 
> /usr/local/lib/qt4/libQtDBus.so.4
> #11 0x2808f2ea in QDBusConnection::~QDBusConnection () from 
> /usr/local/lib/qt4/libQtDBus.so.4
> #12 0x2864ad8f in __cxa_finalize () from /lib/libc.so.7
> #13 0x285efe1a in exit () from /lib/libc.so.7
> #14 0x08051db5 in main ()
> (gdb)
> #0  0x281d7856 in QThreadData::current () from 
> /usr/local/lib/qt4/libQtCore.so.4
> #1  0x282f58e3 in QObject::QObject () from /usr/local/lib/qt4/libQtCore.so.4
> #2  0x281d4710 in QThread::QThread () from /usr/local/lib/qt4/libQtCore.so.4
> #3  0x281d5a9e in QAdoptedThread::QAdoptedThread () from 
> /usr/local/lib/qt4/libQtCore.so.4
> #4  0x281d7934 in QThreadData::current () from 
> /usr/local/lib/qt4/libQtCore.so.4
> #5  0x282f58e3 in QObject::QObject () from /usr/local/lib/qt4/libQtCore.so.4
> #6  0x281d4710 in QThread::QThread () from /usr/local/lib/qt4/libQtCore.so.4
> #7  0x281d5a9e in QAdoptedThread::QAdoptedThread () from 
> /usr/local/lib/qt4/libQtCore.so.4
> #8  0x281d7934 in QThreadData::current () from 
> /usr/local/lib/qt4/libQtCore.so.4
> #9  0x281d4747 in QThread::currentThread () from 
> /usr/local/lib/qt4/libQtCore.so.4
> #10 0x28097248 in QDBusConnectionPrivate::deleteYourself () from 
> /usr/local/lib/qt4/libQtDBus.so.4
> #11 0x2808f2ea in QDBusConnection::~QDBusConnection () from 
> /usr/local/lib/qt4/libQtDBus.so.4
> #12 0x2864ad8f in __cxa_finalize () from /lib/libc.so.7
> #13 0x285efe1a in exit () from /lib/libc.so.7
> #14 0x08051db5 in main ()
> (gdb)
> 
> dead-loop in QT library until the stack overflow.
> 
> As I said, it depends on ordering the global objects are destructed, if 
> the object which deleting
> the current_thread_data_key is destructed lastly, the problem wont 
> happen, but now
> it is destructed too early. I believe there is no specification said 
> that which C++ object should be
> destructed first if they are in different compiled module and then are 
> linked together to generated
> a shared object, .so file.

What you presented is quite convincing, I agree with the analysis.

My previous note about 'cannot happen' was about __pthread_cxa_finalize()
being called before all atexit hooks for given dso are executed. And patch
should fix one corner case there, which is apparently not relevant to the
problem in hand.

Standard for C++ only mandates order of ctr/dtr calls for objects
defined in single compilation unit. There is gcc extension that indeed
allows to specify order of constructor/destructor calls for objects
contained in the given dso regardless of their location in source code,
see init_priority() object attribute.

Received on Mon May 21 2012 - 07:38:50 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:27 UTC