Re: ZFS patches. [Problem with root on zfs and upgrading]

From: George Hartzell <hartzell_at_alerce.com>
Date: Tue, 29 Jul 2008 10:26:05 -0700
On Tue, 2008-07-29 at 09:24 -0700, George Hartzell wrote:
> Beat Gätzi writes:
>  > Hi,
>  > 
>  > Pawel Jakub Dawidek wrote:
>  > > The patch above contains the most recent ZFS version that could be found
>  > > in OpenSolaris as of today. Apart for large amount of new functionality,
>  > > I belive there are many stability (and also performance) improvements
>  > > compared to the version from the base system.
>  > 
>  > Thanks for the great work!
>  > 
>  > > Please test, test, test. If I get enough positive feedback, I may be
>  > > able to squeeze it into 7.1-RELEASE, but this might be hard.
>  > 
>  > I have a amd64 box with 8GB RAM running CURRENT-200806 snapshot. I get
>  > the latest version of the sources with csup, applied your patch and
>  > build the world/kernel.
>  > /usr/src and /usr/obj are located on a zfs file system. After "make
>  > installkernel" and reboot into single user mode I had to start the zfs
>  > file system but it failed:
>  > 
>  > # fsck
>  > # mount -a
>  > # /etc/rc.d/hostid start
>  > Setting hostuuid: ...
>  > Setting hostid: ...
>  > # /etc/rc.d/zfs start
>  > lock order reversal:
>  >  1st 0xffffff0004832620 ufs (ufs) _at_ /usr/src/sys/kern/vfs_subr.c:2053
>  >  2nd 0xffffffff80b09da0 kernel linker (kernel linker) _at_
>  > /usr/src/sys/kern/kern_linker.c:693
>  > KDB: stack backtrace:
>  > db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
>  > witness_checkorder() at witness_checkorder+0x609
>  > _sx_xlock() at _sx_xlock+0x52
>  > linker_file_lookup_set() at linker_file_lookup_set+0xe1
>  > linker_file_register_sysctls() at linker_file_register_sysctls+0x20
>  > linker_load_module() at linker_load_module+0x919
>  > linker_load_dependencies() at linker_load_dependencies+0x1bc
>  > link_elf_load_file() at link_elf_load_file+0xa96
>  > linker_load_module() at linker_load_module+0x8cf
>  > kern_kldload() at kern_kldload+0xac
>  > kldload() at kldload+0x84
>  > syscall() at syscall+0x1bf
>  > Xfast_syscall() at Xfast_syscall+0xab
>  > --- syscall (304, FreeBSD ELF64, kldload), rip = 0x80068561c, rsp =
>  > 0x7fffffffec88, rbp = 0 ---
>  > This module (opensolaris) contains code covered by the
>  > Common Development and Distribution License (CDDL)
>  > see http://opensolaris.org/os/licensing/opensolaris_license/
>  > WARNING: ZFS is considered to be an experimental feature in FreeBSD.
>  > ZFS filesystem version 11
>  > ZFS storage pool version 11
>  > internal error: out of memory
>  > internal error: out of memory
>  > internal error: out of memory
>  > internal error: out of memory
>  > 
>  > Running "zpool list" shows no available pool and the "internal error:
>  > out of memory" error message.
>  > 
>  > The same problem occurs in multi-user mode. loader.conf is set to:
>  > vm.kmem_size_max="2147483648"
>  > vm.kmem_size="2147483648"
>  > 
>  > Increase/remove the kmem_size-values didn't change anything.
>  > 
>  > To solve the problem I had to boot kernel.old and run make
>  > installworld/mergemaster. After rebooting with the new kernel the pool
>  > was available again and everything work without a problem.
>  > 
>  > Did I do something wrong when I upgraded the server?
> 
> I'm being bitten by the problem that bit Beat, but worse.
> 
> I'm running a root on zfs system, built using variations of Yarema's
> tools (which do a great job of rounding up and automating all of the
> little tips and tricks about putting your root on a zfs filesystem,
> you should read and understand what they're doing though, you'll
> probably need to adapt them a bit...
>   [ http://yds.coolrat.org/zfsboot.shtml ]).
> 
> I moved a computer from -STABLE up to -CURRENT via csup and rebuilt
> everything to convince myself that the upgrade went well.
> 
> Then I applied Pawel's patch (-p0 -E), and:
> 
>   make buildworld
>   make buildkernel KERNCONF=BLUETOO
>   make installkernel KERNCONF=BLUETOO
> 
> and rebooted.  I planned to drop down to single user and do the
> mergemaster/installworld.
> 
> When I try to boot multi user things go south and it's clear that /usr
> et al. is missing.
> 
> I can boot my new kernel single user and my root gets mounted from my
> zpool, but none of my other zfs filesystems are mounted, and when I
> try to run zfs list or zpool status I got the same out of memory
> message that Beat sees.
> 
> The ZFS filesystem and pool are at version 11 (seen scrolling by on
> the console).
> 
> I suspect that my newer kernel isn't cooperating with the older
> userland utilities which prevents the filesystems from being mounted.
> 
> I tried to boot from kernel.old, but I end up at the mountroot prompt
> and can't mount my root.  Presumably since my pool has been
> automagically upgraded to version 11 I can no longer mount my root
> using kernel.old, so Beat's end-run won't help me.
> 
> There's nothing I care about on the machine, just the time it took to
> csup and build and such, so if I have to scrag it and start over it's
> not a the end of the world.
> 
> Maybe someone could make an patched copy of /sbin/zfs (and whatever
> dependencies it has into /lib, etc...) available and I could drop them
> onto a usb key and use some combination of PATH and LD_LIBRARY_PATH to
> use them to get my /usr etc... mounted?
> 
> Or I could build up another machine to the same patched point, do the
> buildworld and buildkernel, then use that to make a patched bootable
> usb drive.  That'll take a while to free up the extra hardware though.

It turns out that I can boot into single user with the new kernel and
then mount the zfs filesystems by hand, like this:

  mount -t zfs z/usr /usr

Just need to do it (little scripting on a similar system helps) for the
43 zfs filesystems that yarema's tool set up and I'm booted multi-user
with Pawel's new patches.

phew.

g.
Received on Tue Jul 29 2008 - 15:49:06 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:33 UTC