Re: phoenix crash in libc_r on sparc64

From: Daniel Eischen <eischen_at_pcnet.com>
Date: Wed, 4 Jun 2003 23:09:41 -0400 (EDT)
On Thu, 5 Jun 2003, Thomas Moestl wrote:

> On Wed, 2003/06/04 at 00:30:36 -0700, Kris Kennaway wrote:
> > On Mon, Jun 02, 2003 at 04:15:43PM -0700, Kris Kennaway wrote:
> > > phoenix on my sparc64 crashed while idle with the following:
> > > 
> > > Fatal error '_waitq_insert: Already in queue' at line 321 in file /usr/src/lib/libc_r/uthread/uthread_priority_queue.c (errno = 2)
> > > 
> > > Any ideas?
> 
> It should have dropped a core - can you please take a look at it with
> gdb?
> 
> > One of the libc_r tests seems to hang:
> > 
> > Test static library:
> > --------------------------------------------------------------------------
> > Test                                      c_user c_system c_total     chng
> >  passed/FAILED                            h_user h_system h_total   % chng
> > --------------------------------------------------------------------------
> > hello_d                                     0.00     0.02    0.02
> >  passed
> > --------------------------------------------------------------------------
> > hello_s                                     0.00     0.02    0.02
> >  passed
> > --------------------------------------------------------------------------
> > join_leak_d                                 0.77     0.18    0.95
> >  passed
> > --------------------------------------------------------------------------
> > mutex_d                                     9.08    92.42  101.50
> >  passed
> > --------------------------------------------------------------------------
> > sem_d                                       0.01     0.02    0.02
> >  passed
> > --------------------------------------------------------------------------
> > sigsuspend_d                                0.00     0.02    0.02
> >  passed
> > --------------------------------------------------------------------------
> > sigwait_d                                   0.00     0.02    0.02
> >  *** FAILED ***

This one is suppose to kill the process at the end.

> > --------------------------------------------------------------------------
> > guard_s.pl
> > 
> > It's been sitting there for hours now.
> 
> This an unfortunate failure mode, which is caused by a fault on the
> stack while all signals are masked (by libc_r internals, I assume);
> the kernel will fail to store the user register windows on the stack,
> and because SIGILL is blocked, it cannot notify (or terminate) the
> process and is stuck trying to copy out the register windows over and
> over.
> 
> > P.S. Why do 3 of the tests even fail on i386?
> 
> The guard test includes constants which are machine- and
> compiler-specific, probably this broke due to a gcc upgrade.
> 
> The sigwait test is killed by it's own SIGUSR1, and this behaviour
> actually looks correct to me (but I could easily be wrong, since the
> signal behaviour of pthreads seems to be quite complex).

Right, that is part of the test.  I guess the expect script doesn't
know that though.

> The propagate test failure is due to problems in libc (failing to
> use the underscored versions of functions overridden in libc_r). The
> attached patch should fix that; Daniel, does this look OK to you?

Yes, if those functions are used in libc, then that is what
[un-]namespace.h is for.  Any overridden functions in libc_r must use
single underscore versions so that libc_r won't introduce cancellation
points in places where there shouldn't be any or invoke signal handlers
while a library-private lock is held.

-- 
Dan Eischen
Received on Wed Jun 04 2003 - 18:09:44 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:10 UTC