Re: upgrade 6-STABLE to -CURRENT on sparc64 renders box unusable

From: Peter Wemm <peter_at_wemm.org>
Date: Thu, 12 Jul 2007 13:09:23 -0700
On Tuesday 10 July 2007, Michiel Boland wrote:
> Well, in fact I did manage to debug this further. :)
>
> The problem is that on sparc64 and -CURRENT, every executable
> segfaults in
>
>   _rtld
>    init_rtld
>     relocate_objects
>      reloc_non_plt
>       mmap
>        __getosreldate
>
> It appears that __getosreldate was added five days ago, which may
> explain why the breakage on sparc64 hasn't been reported yet. (I am
> ccing peter_at_ since he committed this.)
>
> If I apply the following patch, then rebuild libc, things are more or
> less ok again. Of course this patch is very suboptimal, I am just
> trying to point out where the problem is.
>
> --- __getosreldate.c.orig	2007-07-10 22:29:02.000000000 +0200
> +++ __getosreldate.c	2007-07-10 22:28:20.000000000 +0200
> _at__at_ -42,13 +42,10 _at__at_
>   int
>   __getosreldate(void)
>   {
> -	static int osreldate;
> +	int osreldate;
>   	size_t len;
>   	int oid[2];
>   	int error, osrel;
> -
> -	if (osreldate != 0)
> -		return (osreldate);
>
>   	oid[0] = CTL_KERN;
>   	oid[1] = KERN_OSRELDATE;


Your other option would be to add WITHOUT_SYSCALL_COMPAT=yes
to /etc/make.conf.  That gets rid of the __getosreldate() calls
entirely, but at the expense of being able to boot an older
kernel after userland has been updated.  We could make this
option default on sparc64 if it was acceptable.

Another option might to hack rtld given the unusual circumstances:

Index: libexec/rtld-elf/sparc64/reloc.c
_at__at_ -247,6 +247,9 _at__at_
        return (0);
 }

+
+void *__sys_freebsd6_mmap(void *, size_t, int, int, int, int, off_t);
+
 int
 reloc_non_plt(Obj_Entry *obj, Obj_Entry *obj_rtld)
 {
_at__at_ -260,7 +263,8 _at__at_
         * The dynamic loader may be called from a thread, we have
         * limited amounts of stack available so we cannot use alloca().
         */
-       cache = mmap(NULL, bytes, PROT_READ|PROT_WRITE, MAP_ANON, -1, 0);
+       cache = __sys_freebsd6_mmap(NULL, bytes, PROT_READ|PROT_WRITE, MAP_ANON,
+           -1, 0, 0);
        if (cache == MAP_FAILED)
                cache = NULL;

This would avoid the pre-reloc-fixup use of __getosreldate() via
mmap.  In spite of the name, freebsd6_mmap is "standard" in the tree
right now and isn't going to become 'compat6' till comfortably after
the release.  The __getosreldate() thing would go away at the same time,
so the problem would be "solved".  The catch would be that a slightly
out-of-date userland would depend on COMPAT_FREEBSD6 on sparc64.  sparc64
boxes would be able to boot/run relatively old kernel.old's even after a
fresh build/install world.

PS: I've been told the same problem applies to powerpc..

PPS: I tried for 4 days to get a sun4v box to build world (shared with
sparc64).  I ended up giving up and just building/installing a new libc.
I forgot that ld-elf.so.1 was statically linked against libc_pic.a.
-- 
Peter Wemm - peter_at_wemm.org; peter_at_FreeBSD.org; peter_at_yahoo-inc.com
"All of this is for nothing if we don't go to the stars" - JMS/B5
Received on Thu Jul 12 2007 - 18:25:30 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:14 UTC