Re: boot2 -- Round 2

From: Jeremy Chadwick <freebsd_at_jdc.parodius.com>
Date: Fri, 30 Jul 2004 23:03:23 -0700
I'll be sending you a file attachment (tarball) of applicable portions
of my /boot on "Box B"; feel free to take a peek and tell me if anything
looks awry.  If you'd like this posted to -current, let me know, otherwise
be expecting it in your personal inbox.

One work-around I have found: upon booting the machine (as so happens,
via PXE) with a 5.2.1-RELEASE kernel and mfsroot (with /rescue added
to it) solely to run `disklabel -B ad0s1', things began working as
they should ("0:ad(0,a)").  I doubt disklabel itself is broken, as it
works great on "Box A".  Of course, I never saw the "backtick problem"
on Box A to begin with -- just a panic once boot1/boot2 was hit.

Just to throw this out there: would boot0cfg play any role in all of
this?  I've tinkered around with it on "Box B" solely for troubleshooting,
but all my changes have shown up where they should (i.e. in boot0, a.k.a.
"F1 FreeBSD").

What Matt chimed in with sounds dead on.  I'll have to take a little
peek at my DFly box...

-- 
| Jeremy Chadwick                                 jdc at parodius.com |
| Parodius Networking                        http://www.parodius.com/ |
| UNIX Systems Administrator                   Mountain View, CA, USA |
| Making life hard for others since 1977.                             |

On Sat, Jul 31, 2004 at 12:27:13AM +0000, Alexander Kabaev wrote:
> On Fri, Jul 30, 2004 at 02:28:43PM -0700, Jeremy Chadwick wrote:
> > So, in regards to the commited fix:
> > 
> > This seemed to fix the issue on one of my boxes (the one which was
> > flat-out panic'ing, not the one which was reporting 0:ad(0,`) as the
> > default slice to load /boot/loader from).  I'll refer to the one which
> > panic'd as "Box A" while the one which is doing the backtick as "Box B".
> > 
> > After pulling cvs down last night and rebuilding world+kernel+boot
> > blocks, running disklabel -B ad0s1, all on Box B, I found the machine
> > once again spitting out "Invalid partition", trying to load loader(8)
> > off of 0:ad(0,`) instead of 0:ad(0,a).  I double-checked boot2/Makefile
> > to see if -fno-unit-at-a-time was in place -- and it was.
> > 
> > I've tried using /boot/boot off of Box A and applying it to Box B using
> > disklabel -B -b /boot/box_b/boot ad0s1 to no avail.
> > 
> > It seems almost as if the boot2 code is broken in such a way that it
> > resembles an "off-by-one" error (ASCII 0x60 == `, ASCII 0x61 == a).
> > Why it's picking ` is beyond me...
> > 
> > Can someone shed some light as to how I can go about debugging this,
> > as well as mention how I can temporarily work around this?  Box B
> > happens to run mysqld, and is suffering from some issues mentioned on
> > freebsd-threads (re: machine randomly hard-locking), so it definitely
> > needs to be able to boot back up on it's own without my intervention.
> > 
> > Thanks!
> Hi,
> 
> I guess I would like to get your /boot/boot. The one I got simply works
> on all boxes in my home :(.
> 
> As another option, you can try an alternative patch which was proposed
> by Tim Robbins. Since the problem was apparently caused by me going back to
> static memcpy implementation, I am currenly working on using builtin
> memcpy as it was used before. I will post it later after I've done some
> more testing and if things will look good.
> 
> --
> Alexander Kabaev
> 
> ======== Begin quote ==============
> 
> After a few hours of head-scratching, I've tracked down the problem with
> boot2 and -funit-at-a-time, and come up with a patch that makes it work:
> 
> ==== //depot/user/tjr/freebsd-tjr/src/sys/boot/i386/boot2/boot2.c#7 - /home/tim/p4/src/sys/boot/i386/boot2/boot2.c ====
> _at__at_ -139,7 +139,16 _at__at_
>  static int xgetc(int);
>  static int getc(int);
>  
> -static void memcpy(void *, const void *, int);
> +/*
> + * GCC 3.4 with -funit-at-a-time (implied by -Os) may use a non-standard
> + * calling convention for static functions, using registers to pass arguments
> + * instead of the stack. However, GCC may emit calls to memcpy() when a
> + * program copies a struct with the assignment operator, and the code it
> + * emits to call memcpy() uses the standard convention, not the register
> + * convention. This means we must declare our memcpy() implementation "__used"
> + * to disable the register calling convention.
> + */
> +static void memcpy(void *, const void *, int) __used;
>  static void
>  memcpy(void *dst, const void *src, int len)
>  {
> 
> 
> I think this is a bug in GCC; it should emit a warning if it's about to emit
> code to call memcpy(), but finds that memcpy() has a prototype that conflicts
> with the assumptions it makes.
> 
> 
> Tim
> 
> _______________________________________________
> freebsd-current_at_freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org"
Received on Sat Jul 31 2004 - 04:04:03 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:04 UTC