Re: What is OpenZFS doing during boot?

From: Alan Somers <asomers_at_freebsd.org> Date: Fri, 30 Apr 2021 08:12:25 -0600 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:28 UTC

On Fri, Apr 30, 2021 at 7:21 AM Ulrich Spörlein <uqs_at_freebsd.org> wrote:

> Hi folks, this is a stable/13 question but I figured it's still close
> enough to -CURRENT to count.
>
> So I wanted to update my (remote) system with freebsd-update, but that
> installed half a kernel and bricked the machine upon reboot. Lucky me I
> fixed OOB access just the day before.
>
> Did the usual world/kernel build and ran etcupdate, merging in my
> local changes. This bricked the system again, as it removed the -x bit on
> /etc/rc.d/netif, I filed
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=255514 for that though
> (I
> never had such trouble with mergemaster, just even understanding what
> etcupdate is trying to do and how to bootstrap it is a mystery to me).
>
> Anyway, I have a data zpool on 2x encrypted GELI providers that I can only
> unlock (and zpool import) with 2 passphrases after the system has booted.
>
> Color me surprised when some RC script thought otherwise and tried to
> import the pool during boot. Why does it do that, that's not supposed to
> work and it should not even touch the encrypted bits (yet).
>
> mountroot: waiting for device /dev/mirror/gm0a...
> Dual Console: Serial Primary, Video Secondary
> GEOM_ELI: Device gpt/swap0.eli created.
> GEOM_ELI: Encryption: AES-XTS 128
> GEOM_ELI:     Crypto: accelerated software
> GEOM_ELI: Device gpt/swap1.eli created.
> GEOM_ELI: Encryption: AES-XTS 128
> GEOM_ELI:     Crypto: accelerated software
> Setting hostuuid: d7902500-4c7c-0706-0025-90d77c4c0e0f.
> Setting hostid: 0x8a2b4277.
> cannot import 'data': no such pool or dataset
>         Destroy and re-create the pool fipmi0: Unknown IOCTL 40086481
> ipmi0: Unknown IOCTL 40086481
> rom
>         a backup source.
> cachefile import failed, retrying
> nvpair_value_nvlpid 69 (zpool), jid 0, uid ist(nvp, &rv) == 0 (0x16 == 0)
> ASSERT at /usr/src/sys/contrib/openzfs/module/nv0: exited on signal 6
> pair/fnvpair.c:586:fnvpair_value_nvlist()Abort trap
> cannot import 'data': no such pool or dataset
>         ipmi0: Unknown IOCTL 40086481
> ipmi0: Unknown IOCTL 40086481
> Destroy and re-cpid 74 (zpool), jid 0, uid 0: exited on signal 6
> reate the pool from
>         a backup source.
> cachefile import failed, retrying
> nvpair_value_nvlist(nvp, &rv) == 0 (0x16 == 0)
> ASSERT at
>
> /usr/src/sys/contrib/openzfs/module/nvpair/fnvpair.c:586:fnvpair_value_nvlist()Abort
> trap
> Starting file system checks:
> /dev/mirror/gm0a: FILE SYSTEM CLEAN; SKIPPING CHECKS
> /dev/mirror/gm0a: clean, 370582 free (814 frags, 46221 blocks, 0.2%
> fragmentation)
> /dev/mirror/gm0d: FILE SYSTEM CLEAN; SKIPPING CHECKS
> /dev/mirror/gm0d: clean, 867640 free (1160 frags, 108310 blocks, 0.1%
> fragmentation)
> /dev/mirror/gm0e: FILE SYSTEM CLEAN; SKIPPING CHECKS
> /dev/mirror/gm0e: clean, 1267948 free (17228 frags, 156340 blocks, 0.7%
> fragmentation)
> Mounting local filesystems:.
>
>
> What do I need to do to _not_ have any zpool operations be attempted during
> startup? How does it even know of the existence of that pool?
>
> I guess it's zfs_enable=NO to stop /etc/rc.d/zpool from messing about. But
> more importantly, the GELI providers don't exist yet, why does it then
> segfault? Shouldn't it be a bit more robust on that front?
>
> Thanks all
> Uli
>

Your problem is the zpool cache file.  As soon as ZFS loads, it tries to
import all pools mentioned in /boot/zfs/zpool.cache.  If you're using ZFS
on top of GELI, then obviously you don't want that.  What you should is
move the cachefile somewhere else.  Do it like this:
$ zpool set cachefile=/some/where/else my-data-pool

And on every boot, import it like this:
$ service geli start
$ zpool import -a -c /some/where/else -o cachefile=/some/where/else

Hope this helps.
-Alan