Re: bootcode capable of booting both UFS and ZFS? (Amazon/ec2)

From: Toomas Soome <tsoome_at_me.com> Date: Sun, 07 May 2017 13:56:46 +0300 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:11 UTC

> On 7. mai 2017, at 13:18, Julian Elischer <julian_at_freebsd.org> wrote:
> 
> On 7/5/17 1:45 pm, Warner Losh wrote:
>> On Sat, May 6, 2017 at 10:03 PM, Julian Elischer <julian_at_freebsd.org> wrote:
>>> On 6/5/17 4:01 am, Toomas Soome wrote:
>>>> 
>>>>> On 5. mai 2017, at 22:07, Julian Elischer <julian_at_freebsd.org
>>>>> <mailto:julian_at_freebsd.org>> wrote:
>>>>> 
>>>>> Subject says it all really, is this an option at this time?
>>>>> 
>>>>> we'd like to try boot the main zfs root partition and then fall back to a
>>>>> small UFS based recovery partition.. is that possible?
>>>>> 
>>>>> I know we could use grub but I'd prefer keep it in the family.
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> it is, sure. but there is an compromise to be made for it.
>>>> 
>>>> Lets start with what I have done in illumos port, as the idea there is
>>>> exactly about having as “universal” binaries as possible (just the binaries
>>>> are listed below to get the size):
>>>> 
>>>> -r-xr-xr-x   1 root     sys       171008 apr 30 19:55 bootia32.efi
>>>> -r-xr-xr-x   1 root     sys       148992 apr 30 19:55 bootx64.efi
>>>> -r--r--r--   1 root     sys         1255 okt 25  2015 cdboot
>>>> -r--r--r--   1 root     sys       154112 apr 30 19:55 gptzfsboot
>>>> -r-xr-xr-x   1 root     sys       482293 mai  2 21:10 loader32.efi
>>>> -r-xr-xr-x   1 root     sys       499218 mai  2 21:10 loader64.efi
>>>> -r--r--r--   1 root     sys          512 okt 15  2015 pmbr
>>>> -r--r--r--   1 root     sys       377344 mai  2 21:10 pxeboot
>>>> -r--r--r--   1 root     sys       376832 mai  2 21:10 zfsloader
>>>> 
>>>> the loader (bios/efi) is built with full complement - zfs, ufs, dosfs,
>>>> cd9660, nfs, tftp + gzipfs. The cdboot is starting zfsloader (thats trivial
>>>> string change).
>>>> 
>>>> The gptzfsboot in illumos case is only built with zfs, dosfs and ufs - as
>>>> it has to support only disk based media to read out the loader. Also I am
>>>> building gptzfsboot with libstand and libi386 to get as much shared code as
>>>> possible - which has both good and bad sides, as usual;)
>>>> 
>>>> The gptzfsboot size means that with ufs the dedicated boot partition is
>>>> needed (freebsd-boot), with zfs the illumos port is always using the 3.5MB
>>>> boot area after first 2 labels (as there is no geli, the illumos does not
>>>> need dedicated boot partition with zfs).
>>>> 
>>>> As the freebsd-boot is currently created 512k, the size is not an issue.
>>>> Also using common code does allow the generic partition code to be used, so
>>>> GPT/MBR/BSD (VTOC in illumos case) labels are not problem.
>>>> 
>>>> 
>>>> So, even just with cd boot (iso), starting zfsloader (which in fbsd has
>>>> built in ufs, zfs etc), you already can get rescue capability.
>>>> 
>>>> Now, even with just adding ufs reader to gptzfsboot, we can use gpt +
>>>> freebsd-boot and ufs root but loading zfsloader on usb image, so it can be
>>>> used for both live/install and rescue, because zfsloader itself has support
>>>> for all file systems + partition types.
>>>> 
>>>> I have kept myself a bit off from freebsd gptzfsboot because of simple
>>>> reason - the older setups have smaller size for freebsd boot, and not
>>>> everyone is necessarily happy about size changes:D also in freebsd case
>>>> there is another factor called geli - it most certainly does contribute some
>>>> bits, but also needs to be properly addressed on IO call stack (as we have
>>>> seen with zfsbootcfg bits). But then again, here also the shared code can
>>>> help to reduce the complexity.
>>>> 
>>>> Yea, the zfsloader/loader*.efi in that listing above is actually built
>>>> with framebuffer code and compiled in 8x16 default font (lz4 compressed
>>>> ascii+boxdrawing basically - because zfs has lz4, the decompressor is always
>>>> there), and ficl 4.1, so thats a bit of difference from fbsd loader.
>>>> 
>>>> Also note that we can still build the smaller dedicated blocks like boot2,
>>>> just that we can not use those blocks for more universal cases and
>>>> eventually those special cases will diminish.
>>> 
>>> thanks for that..
>>> 
>>>  so, here's my exact problem I need to solve.
>>> FreeBSD 10 (or newer) on Amazon EC2.
>>> We need to have a plan for recovering the scenario where somethign goes
>>> wrong (e.g. during an upgrade) and we are left with a system where the
>>> default zpool rootfs points to a dataset that doesn't boot. It is possible
>>> that mabe the entire pool is unbootable into multi-user..  Maybe somehow it
>>> filled up? who knows. It's hard to predict future problems.
>>> There is no console access at all so there is no possibility of human
>>> intervention. So all recovery paths that start "enter single user mode
>>> and...." are unusable.
>>> 
>>> The customers who own the amazon account are not crazy about giving us the
>>> keys to the kingdom as far as all their EC2 instances, so taking a root
>>> drive off a 'sick' VM and grafting it onto a freebsd instance to 'repair' it
>>> becomes a task we don't want to really have to ask them to do. They may not
>>> have the in-house expertise to do it. confidently.
>>> 
>>> This leaves us with automatic recovery, or at least automatic methods of
>>> getting access to that drive from the network.
>>> Since the regular root is zfs, my gut feeling is that to deduce the chances
>>> of confusion during recovery, I'd like the (recovery) system itself to be
>>> running off a UFS partition, and potentially, with a memory root filesystem.
>>> As long as it can be reached over the network we can then take over.
>>> 
>>> we'd also like to have the boot environment support in the bootcode.
>>> so, what would be the minimum set we'd need?
>>> 
>>> Ufs support, zfs support, BE support, and support for selecting a completely
>>> different boot procedure after some number of boot attempts without getting
>>> all the way to multi-user.
>>> 
>>> How does that come out size-wise?  And what do I need to  configure to get
>>> that?
>>> 
>>> The current EC2 Instances have a 64kB boot partition , but I have a window
>>> to convince management to expand that if I have a good enough  argument.
>>> (since we a re doing a repartition on the next upgrade, which is "special"
>>> (it's out upgrade to 10.3 from 8.0).
>>> Being able to self heal or at least 'get at' a sick instance might be a good
>>> enough argument and would make the EC2 instances the same as all the other
>>> versions of the product..
>> You should convince them to move to 512k post-haste. I doubt 64k will
>> suffice, and 512k is enough to get all the features you desire.
> 
> yeah I know but sometimes convincing management of things is like banging one's head against a wall.
> Don't think I haven't tried, and won't keep trying.
> 

To support recovery there can be 2 scenarios:

1. something has gone bad and you boot from alternate media (iso/usb/net), log in, and fix the setup.
2. if the alternate media is not available, there has to be recovery “image”, preferably isolated from rest of the system, such as recovery partition.

The second option needs an mechanism to get activated; something like “X times try normal boot, then use recovery”. The zfsbootcfg Andriy did, is currently providing the reverse option - try this config, if it is failing, fall back to normal. But that work can be used as base nevertheless - to provide not one time [next] boot config, but fallback.

Of course something like “recovery partition” would need to be architected to be as foolproof as possible, but it definitely is possible.

BTW:  this is a bit specific to illumos and zfs, but some concerns and ideas from comments are still worth to be noted: https://www.illumos.org/rb/r/249/  - especially the pad area should actually have not simple string, but some structure to allow different semantics (next boot or fall back boot, maybe something other).

rgds,
toomas