Re: SIGSEGV in /bin/sh after r322740 -> r322776 update

From: Hartmann, O. <ohartmann_at_walstatt.org>
Date: Tue, 22 Aug 2017 20:08:57 +0200
On Tue, 22 Aug 2017 06:38:36 -0700
David Wolfskill <david_at_catwhisker.org> wrote:

I also ran into this problem after "upgrading" to r322769 and now I
have on ALL systems, I did this "upgrade", a wrecked system which isn't
even capable of compiling a new kernel or world. 

I can understand that something weird and havoc can happen on systems
running CURRENT with customised kernels, also some hidden problems, but
this serious problem occurs even on vanilla GENERIC systems up to
r322798! I just tried to cleandir everything and rebuild world and
kernel which is on some slow boxes a pain in the arse (and I always
thought LLVM/CLANG's goal was to shorten compile cycles ... the
opposite seems the fact, by the way).

The arising question is with view to GENERIC: do those changes even get
tested on real hardware or is it all theory/virtual when commited?

Just a question. I'm awaiting this patch in the hope I can rebuild
everything to normal.

Thanks,

oh

> On Tue, Aug 22, 2017 at 04:19:58PM +0300, Konstantin Belousov wrote:
> > ...  
> > > > Ok, can you rebuild kernel and libc from scratch ?  I.e. remove
> > > > your object directories.  
> > > 
> > > I think I'll need a working /bin/sh to do that.  As noted, I could
> > > try the stable/11 /bin/sh; on the other hand, if it's dying in a
> > > library, that's not likely to help a whole lot. :-}  
> > I highly suspect that this is not /bin/sh at all.  Backtrace
> > strongly suggests that the malloc() has issues, but again I suspect
> > that the reason is not an issue in malloc, but its use of TLS.  
> 
> I think I hope that this use of "TLS" is not the one associated with
> (say) SSL....  :-}
> 
> > The amd64 changes were to the TLS base register handling.  So you
> > might try to boot previous kernel.  If this works out without
> > replacing libc then it is definitely TLS, but I still do not know
> > what is wrong. ....  
> 
> OK; we have a bit of progress, then:
> * When I tried to rename the kernel directories in /boot, I got more
>   segfaults.  So I figured I'd use the boot menu to select
> kernel.old, and just tried "sudo shutdown -r now" -- and got a
> segfault.  "sudo reboot" did, as well.  So did "sudo kill 1".  On a
> whim, I tried "sudo halt"; that actually worked.
> 
> * After the (successful) reboot from kernel.old, I was able to rename
>   kernel directories without issue.  This may be useflu evidence.
> 
> * Flushed with that success, I have started a fresh clean build of
>   r322776.  (I had managed to clear /usr/obj prior to the reboot.)
> 
> * I should be able to provide updated status within about 30 minutes.
> 
> Thanks again for all your help!
> 
> Peace,
> david
Received on Tue Aug 22 2017 - 16:09:13 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:12 UTC