Re: firefox3-bin crashes near arc4random_buf()

From: Tim Kientzle <kientzle_at_freebsd.org>
Date: Sun, 05 Oct 2008 17:34:22 -0700
> I watched it crash a bunch more times and the backtraces are the same. That's
> good, right? :-)

Yes.  For a suitable definition of "good."  ;-)

>>It might also be worth running it under ktrace,
>>forcing the crash, then sharing the last few dozen
>>lines from kdump output.
> 
> Also attached is firefox3.kdump. The last few lines look like:
> 
>   6855 firefox-bin RET   clock_gettime 0
>   6855 firefox-bin CALL  _umtx_op(0x8179760,0x8,0x1,0x8179740,0xbf8fdddc)
>   6855 firefox-bin PSIG  SIGSEGV caught handler=0x28237290 mask=0x0 code=0x1
>   6855 firefox-bin CALL  unlink(0x8179600)
>   6855 firefox-bin NAMI  "/home/jos/.mozilla/firefox/tosfxhak.default/lock"
>   6855 firefox-bin RET   unlink 0
>   6855 firefox-bin CALL  sigaction(SIGSEGV,0x2978dfb4,0)
>   6855 firefox-bin RET   sigaction 0
>   6855 firefox-bin CALL  sigprocmask(SIG_UNBLOCK,0xbf4f906c,0)
>   6855 firefox-bin RET   sigprocmask 0
>   6855 firefox-bin CALL  thr_kill(0x1878c,SIGSEGV)
>   6855 firefox-bin RET   thr_kill 0
>   6855 firefox-bin PSIG  SIGSEGV SIG_DFL
> 
> This to me suggests that the segfault happens inside _umtx_op. Am I reading
> that correctly?

Not necessarily.  Firefox is multi-threaded.  The thread that
called _umtx_op() is not the thread that crashed (_umtx_op()
hadn't returned to userspace, so that thread was still in
the kernel).

This does, however, answer one puzzle:  Firefox appears to
have a signal handler that catches SEGV, releases the lock
file, then re-throws SEGV to actually kill the program.
That explains stack frames #0-#4 in your backtrace; that's
the signal handler executing after the segfault but before
the program is terminated.

Something is still screwy about the backtrace.  dbopen()
doesn't call arc4random_buf.  However, it does call
mkstemp() which does call arc4random_uniform, which should
be right next to arc4random_buf in memory.  GCC optimizations
could be obscuring the call stack here.

It's certainly possible that arc4random is involved
somehow but I don't yet see it.  It does seem likely
that we're looking at a libc problem, so a debug
version of libc might help.  Replacing libc on a
running system is a little tricky.  I believe the
following works, though I've not tried it:

% cd /usr/src/lib/libc
% make clean
% make DEBUG_FLAGS=-g
% cp /lib/libc.so.7 /lib/libc.so.7-backup
... reboot to single user, use /rescue/sh as your shell ...
% cp /usr/src/lib/libc/libc.so.7 /lib/libc.so.7
... reboot ...

This should give you a standard libc with full
debugging symbols.  Hopefully, the backtrace will
now give more details.

I think we're getting closer.

Tim
Received on Sun Oct 05 2008 - 22:34:29 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:36 UTC