Re: RFC: jemalloc: qdbus sigsegv in malloc_init

From: Jason Evans <jasone_at_freebsd.org>
Date: Mon, 30 Apr 2012 12:34:32 -0700
On Apr 30, 2012, at 7:13 AM, Gustau Pérez i Querol wrote:
>  the kde team is seeing some strange problems with the new version (4.8.1) of devel/dbus-qt4 with current. It does work with stable. I also suspect that the problem described below is affecting the experimental cinnamon port (an alternative to gnome3, possible replacement of gnome2).
> 
>  The problem happens with both i386 and amd64 with empty /etc/malloc.conf and simple /etc/make.conf. Everything compiled with base gcc (no clang). The kernel was compiled with no debug support, but it can enable if needed. There are reports from avilla_at_freebsd.org of the same behavior with clang compiled world and kernel and with   MALLOC_PRODUCTION=yes.
> 
> When qdbus starts, it segfauts. The backtrace of the problem with r234769 can be found here: http://pastebin.com/ryBXtqGF. When starting the qdbus daemon by hand in a X+twm session, we see it calls calloc many times and after a fixed number of times segfaults. We see it segfaults at rb_gen (a quite large macro defined at $SRC_BASE/contrib/jemalloc/include/jemalloc/internal/rb.h).
> 
> If the daemon is started by hand, I'm able to skip all the calls qdbus makes to calloc till the one causing the segfault. At that point, at rb_gen, we don't exactly know what is going on or how to debug the macro. Ktrace are available, but we were unable to find anything new from them.
> 
>  With old versions of current before the jemalloc imports (as of March 30th) the daemon segfaulted at malloc.c:2426. With revisions during April 20 to 24th (can be more precise, it was during the jemalloc imports) the daemon segfaulted at malloc_init. Bts are available if needed, and if necessary I can go back to those revision and recompile world+kernel to see its behavior.
> 
>  Any help from freebsd-current_at_ (perhaps Jason Evans can help us) will be appreciated. Any additional info, like source revisions, can be provided. I would like to stress that the experimental devel/dbus-qt4 works fine with recent stable.

The crash is happening in page run management, so there is some pretty bad memory corruption going on by the time of the crash.  If I understand you correctly, you have reproduced the crash on a system that does *not* have MALLOC_PRODUCTION defined, which means that none of the assertions in jemalloc caught the problem.

Adrian Chadd made the excellent suggestion of trying valgrind; it's likely to point out the problem almost immediately.  If that doesn't work, the utrace functionality in malloc may help you figure out what activity has occurred by the time of the crash, and give you a better understanding of what happened to memory around the address that is involved in the crash.

Jason
Received on Mon Apr 30 2012 - 17:34:31 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:26 UTC