Re: phoenix crash in libc_r on sparc64

From: Thomas Moestl <t.moestl_at_tu-bs.de>
Date: Thu, 5 Jun 2003 01:56:08 +0200
On Wed, 2003/06/04 at 00:30:36 -0700, Kris Kennaway wrote:
> On Mon, Jun 02, 2003 at 04:15:43PM -0700, Kris Kennaway wrote:
> > phoenix on my sparc64 crashed while idle with the following:
> > 
> > Fatal error '_waitq_insert: Already in queue' at line 321 in file /usr/src/lib/libc_r/uthread/uthread_priority_queue.c (errno = 2)
> > 
> > Any ideas?

It should have dropped a core - can you please take a look at it with
gdb?

> One of the libc_r tests seems to hang:
> 
> Test static library:
> --------------------------------------------------------------------------
> Test                                      c_user c_system c_total     chng
>  passed/FAILED                            h_user h_system h_total   % chng
> --------------------------------------------------------------------------
> hello_d                                     0.00     0.02    0.02
>  passed
> --------------------------------------------------------------------------
> hello_s                                     0.00     0.02    0.02
>  passed
> --------------------------------------------------------------------------
> join_leak_d                                 0.77     0.18    0.95
>  passed
> --------------------------------------------------------------------------
> mutex_d                                     9.08    92.42  101.50
>  passed
> --------------------------------------------------------------------------
> sem_d                                       0.01     0.02    0.02
>  passed
> --------------------------------------------------------------------------
> sigsuspend_d                                0.00     0.02    0.02
>  passed
> --------------------------------------------------------------------------
> sigwait_d                                   0.00     0.02    0.02
>  *** FAILED ***
> --------------------------------------------------------------------------
> guard_s.pl
> 
> It's been sitting there for hours now.

This an unfortunate failure mode, which is caused by a fault on the
stack while all signals are masked (by libc_r internals, I assume);
the kernel will fail to store the user register windows on the stack,
and because SIGILL is blocked, it cannot notify (or terminate) the
process and is stuck trying to copy out the register windows over and
over.

> P.S. Why do 3 of the tests even fail on i386?

The guard test includes constants which are machine- and
compiler-specific, probably this broke due to a gcc upgrade.

The sigwait test is killed by it's own SIGUSR1, and this behaviour
actually looks correct to me (but I could easily be wrong, since the
signal behaviour of pthreads seems to be quite complex).

The propagate test failure is due to problems in libc (failing to
use the underscored versions of functions overridden in libc_r). The
attached patch should fix that; Daniel, does this look OK to you?

	- Thomas

-- 
Thomas Moestl <t.moestl_at_tu-bs.de>	http://www.tu-bs.de/~y0015675/
              <tmm_at_FreeBSD.org>		http://people.FreeBSD.org/~tmm/
PGP fingerprint: 1C97 A604 2BD0 E492 51D0  9C0F 1FE6 4F1D 419C 776C

Received on Wed Jun 04 2003 - 14:56:21 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:10 UTC