Re: ZFS boot problems with memory > 1MB

From: John Baldwin <jhb_at_freebsd.org>
Date: Wed, 24 Feb 2010 09:55:27 -0500
On Tuesday 23 February 2010 7:59:58 pm Brandon Gooch wrote:
> On Tue, Feb 23, 2010 at 10:40 PM, John Baldwin <jhb_at_freebsd.org> wrote:
> > On Tuesday 23 February 2010 5:04:03 pm Brandon Gooch wrote:
> >> On Tue, Feb 23, 2010 at 3:03 PM, John Baldwin <jhb_at_freebsd.org> wrote:
> >> > On Tuesday 23 February 2010 3:36:19 pm Brandon Gooch wrote:
> >> >> On Tue, Feb 23, 2010 at 1:01 PM, John Baldwin <jhb_at_freebsd.org> wrote:
> >> >> > On Tuesday 23 February 2010 12:36:31 pm Brandon Gooch wrote:
> >> >> >> On Tue, Feb 23, 2010 at 10:24 AM, John Baldwin <jhb_at_freebsd.org>
> > wrote:
> >> >> >> > On Tuesday 23 February 2010 10:28:49 am Brandon Gooch wrote:
> >> >> >> >> On Tue, Feb 23, 2010 at 7:29 AM, Andriy Gapon <avg_at_icyb.net.ua>
> > wrote:
> >> >> >> >> > on 23/02/2010 13:18 Renato Botelho said the following:
> >> >> >> >> >> On Mon, Feb 22, 2010 at 7:35 PM, Chris Hedley
> >> >> >> >> >> <freebsd-current_at_chrishedley.com> wrote:
> >> >> >> >> > [snip]
> >> >> >> >> >>> Do you have USB legacy support enabled in your BIOS?  I'm 
not
> > sure
> >> > if
> >> >> >> >> >>> there's an option for the loader to use USB devices 
natively,
> > but
> >> > the BIOS's
> >> >> >> >> >>> legacy option where it provides AT/PS2 emulation is probably
> > the
> >> > easiest way
> >> >> >> >> >>> to get the keyboard working.
> >> >> >> >> >>
> >> >> >> >> >> Yes, I do, but it seems to be a regression on FreeBSD itself, 
I
> > had
> >> > this problem
> >> >> >> >> >> in the past and I checked the same things i need to check in 
the
> >> > past again and
> >> >> >> >> >> everything is fine.
> >> >> >> >> >
> >> >> >> >> > A more precise way to state that would be "a regression in
> > FreeBSD
> >> > boot/loader".
> >> >> >> >> > I think that you are referring to the issue that was fixed by
> >> > r189017.
> >> >> >> >> > It might be worthwhile investigating what was done in that
> > revision
> >> > and what
> >> >> >> >> > happened in sys/boot code since then.
> >> >> >> >> >
> >> >> >> >> > One possibility is that your BIOS uses memory above 1MB for 
USB
> >> > emulation, but
> >> >> >> >> > doesn't mark that memory as used in system memory map.  In 
that
> > case
> >> > that memory
> >> >> >> >> > could be overwritten by the loader.  If that's true then the
> > blame
> >> > is on the BIOS.
> >> >> >> >> >  Alternatively, our code might be parsing the system memory 
map
> >> > incorrectly.
> >> >> >> >> > But I am just making wild guesses here.
> >> >> >> >> >
> >> >> >> >>
> >> >> >> >> I don't know if it is at all related, but this commit has caused
> >> >> >> >> problems for me booting at least one of my machines:
> >> >> >> >>
> >> >> >> >>
> >> >
> > 
http://svn.freebsd.org/viewvc/base/head/sys/boot/i386/zfsboot/zfsboot.c?r1=199714&r2=200309
> >> >> >> >>
> >> >> >> >> Commit message:
> >> >> >> >>
> >> >> >> >> Revision 200309 - (view) (annotate) - [select for diffs]
> >> >> >> >> Modified Wed Dec 9 20:36:56 2009 UTC (2 months, 2 weeks ago) by 
jhb
> >> >> >> >> File length: 24893 byte(s)
> >> >> >> >> Diff to previous 199714
> >> >> >> >> - Port bios_getmem() from libi386 to {gpt,}zfsboot() and use it 
to
> >> >> >> >>   safely allocate a heap region above 1MB.  This enables
> >> > {gpt,}zfsboot()
> >> >> >> >>   to allocate much larger buffers than before.
> >> >> >> >> - Use a larger buffer (1MB instead of 128K) for temporary ZFS
> > buffers.
> >> >  This
> >> >> >> >>   allows more reliable reading of compressed files in a
> > raidz/raidz2
> >> > pool.
> >> >> >> >>
> >> >> >> >> Submitted by: Matt Reimer  mattjreimer of gmail
> >> >> >> >> MFC after:    1 week
> >> >> >> >
> >> >> >> > Starting a new thread, which problems are you seeing with this
> > change?
> >> >  ZFS is
> >> >> >> > a good bit more memory hungry than UFS, so it really needs to use
> > high
> >> > memory
> >> >> >> > for its heap.  Also, I wonder if you still have problems if you 
use
> > the
> >> > older
> >> >> >> > zfsboot with the newer zfsloader?  Finally, you need to use
> > disklabel -
> >> > B or
> >> >> >> > some such to update the zfsboot bits for this change to take 
effect.
> >> >> >> >
> >> >> >> > --
> >> >> >> > John Baldwin
> >> >> >> >
> >> >> >>
> >> >> >> I filed a PR so it wouldn't fall through the cracks:
> >> >> >>
> >> >> >> http://www.freebsd.org/cgi/query-pr.cgi?pr=144234
> >> >> >>
> >> >> >> I guess I tried a combination of various revisions of bootstrap 
code
> >> >> >> and loaders when I first encountered the issue. It was when I wrote 
a
> >> >> >> recent gptzfsboot to the geom that I saw the symptoms:
> >> >> >>
> >> >> >> error 1 lba 48
> >> >> >> error 1 lba 1
> >> >> >> No ZFS pools located, can't boot
> >> >> >>
> >> >> >> I just wound up using sys/boot/i386/zfsboot/zfsboot.c revision 
199714
> >> >> >> to build a working gptzfsboot on another system and wrote that to 
the
> >> >> >> disk to get the machine operational.
> >> >> >
> >> >> > Try this:
> >> >> >
> >> >> > Index: zfsboot.c
> >> >> > ===================================================================
> >> >> > --- zfsboot.c   (revision 204207)
> >> >> > +++ zfsboot.c   (working copy)
> >> >> > _at__at_ -467,6 +467,7 _at__at_
> >> >> >  static inline void
> >> >> >  putc(int c)
> >> >> >  {
> >> >> > +    v86.ctl = 0;
> >> >> >     v86.addr = 0x10;
> >> >> >     v86.eax = 0xe00 | (c & 0xff);
> >> >> >     v86.ebx = 0x7;
> >> >> > _at__at_ -617,6 +618,8 _at__at_
> >> >> >     off_t off;
> >> >> >     struct dsk *dsk;
> >> >> >
> >> >> > +    dmadat = (void *)(roundup2(__base + (int32_t)&_end, 0x10000) -
> >> > __base);
> >> >> > +
> >> >> >     bios_getmem();
> >> >> >
> >> >> >     if (high_heap_size > 0) {
> >> >> > _at__at_ -627,9 +630,6 _at__at_
> >> >> >        heap_end = (char *) PTOV(bios_basemem);
> >> >> >     }
> >> >> >
> >> >> > -    dmadat = (void *)(roundup2(__base + (int32_t)&_end, 0x10000) -
> >> > __base);
> >> >> > -    v86.ctl = V86_FLAGS;
> >> >> > -
> >> >> >     dsk = malloc(sizeof(struct dsk));
> >> >> >     dsk->drive = *(uint8_t *)PTOV(ARGS);
> >> >> >     dsk->type = dsk->drive & DRV_HARD ? TYPE_AD : TYPE_FD;
> >> >> > _at__at_ -1157,6 +1157,7 _at__at_
> >> >> >      * when no such key is pressed in reality. As far as I can tell,
> >> >> >      * this only happens shortly after a reboot.
> >> >> >      */
> >> >> > +    v86.ctl = V86_FLAGS;
> >> >> >     v86.addr = 0x16;
> >> >> >     v86.eax = fn << 8;
> >> >> >     v86int();
> >> >> >
> >> >> > --
> >> >> > John Baldwin
> >> >> >
> >> >>
> >> >> It still breaks:
> >> >>
> >> >> error 1 lba 48
> >> >> error 1 lba 1
> >> >> No ZFS pools located, can't boot
> >> >
> >> > Ok.  Can you add a printf to zfsboot.c to print out dsk->start in the 
case
> >> > that you get an error?  error 1 means that the BIOS thinks it got a bad
> >> > parameter, presumably in the disk packet.  If you wanted to be 
ambitious,
> > just
> >> > print out all of the fields in the packet when it fails.
> >> >
> >> > --
> >> > John Baldwin
> >> >
> >>
> >> Adding printf statements to drvread():
> >>
> >> printf("dsk->xxx: %u\n", dsk->xxx):
> >>
> >> Output:
> >>
> >> error 1 lba 48
> >> dsk->drive: 0
> >> dsk->type: 0
> >> dsk->unit: 0
> >> dsk->slice: 0
> >> dsk->part: 0
> >> dsk->init: 0
> >> dsk->start: 978673664
> >
> > This value looks a bit high, do you have a partition that starts at an 
offset
> > of about 466GB into the disk?
> >
> >> error 1 lba 1
> >> dsk->drive: 0
> >> dsk->type: 0
> >> dsk->unit: 0
> >> dsk->slice: 0
> >> dsk->part: 0
> >> dsk->init: 0
> >> dsk->start: 0
> >> No ZFS pools located, can't boot
> >
> > Sorry, I meant members of the 'packet' variable, though dsk->start is 
useful
> > to have as well.
> >
> > --
> > John Baldwin
> >
> 
> Here it is (with some crazy dsk stuff included):
> 
> error 1 lba 48
> packet.len: 16
> packet.seg: 8192
> packet.count: 16
> packet.lba: 47
> packet.off: 0
> dsk->drive: 4294967295
> dsk->slice: 4294967295
> dsk->type: 4294967295
> dsk->part: 4294967295
> dsk->unit: 4294967295
> dsk->init: 4294967295
> dsk->start: 4294967295

These are all -1 now which looks wrong.  The raw LBA being 47 instead of 48 
would seem to indicate that that is the case though.

> error 1 lba 1
> packet.len: 16
> packet.seg: 8704
> packet.count: 1
> packet.lba: 1
> packet.off: 0

Odd that the lba here isn't 0.

Can you add some more printfs, maybe to probe_drive() to try narrow down how 
many types that is being invoked and for which drive numbers?

-- 
John Baldwin
Received on Wed Feb 24 2010 - 13:57:32 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:01 UTC