Re: CURRENT: supeblock hash failure - CURRENT wrecking disks

From: Kirk McKusick <mckusick_at_mckusick.com>
Date: Wed, 07 Aug 2019 16:59:31 -0000
> Date: Wed, 7 Aug 2019 10:37:29 +0200
> From: "O. Hartmann" <ohartmann_at_walstatt.org>
> To: freebsd-current <freebsd-current_at_freebsd.org>
> Subject: CURRENT: supeblock hash failure - CURRENT wrecking disks
> 
> Hello,
> 
> Today I ran into a ctastrophy with r350671. After installing a fresh
> compiled system and rebooted the box, UEFI loader dropped a bunch
> of errors, like some hex numbers stating, that a hash/superblock
> has is wrong and then the booting stopped at the OK loader prompt.
> 
> Rebooting the machine with the FreeBSD-13-CURRENT image from 1st
> August 2019 and trying to fsck the filesystem(s) on the boot SSD
> (UFS2, journaling and trim on), lots of unresolved block errors
> occured. But that didn't help much.  Further, after several checks,
> I saw some commits to the ffs code recently adn tried to restore a
> copy of the superblock of each filesystem (in contrary to the man
> page for fsck_ufs, the first backup superblock resides in 192, not
> 160!). But things then get even worse, it seems the whole /boot
> structure is corrupted, the loader can not find the recent kernel
> and kernel.old is crashing.
> 
> What's wrong here :-(
> 
> The box in question has been setup 6 weeks ago with FreeBSD 13-CURRENT
> natively. It is now a wreck. Other systems running CURRENT (as of
> the most recent revision as of today) were partially installed as
> 12-STABLE/12-CURRENT and "moved on" to what is now 13-CURRENT. They
> do not(!) indicate such problems reported.
> 
> Either I hit the crap installing a new system whilst there was a
> problem, or something really strange happened.
> 
> The bad thing is that kernel.old exits/dies with an exception and
> /boot/kernel/ seems to be completely corrupted. Tomorrow I try to
> install a prepared pkg tar arcive FreeBSD-kernel from a CURRENT pkg
> base and hope this will fix the issue.
> 
> Regards,
> 
> oh

The boot code checks the superblock hash and reports if it is wrong,
but ignores the error and continues to try and boot. The reason to
continue is to allow the system to come up so that the superblock
check hash can be fixed by running fsck. So your filesystem had
something more seriously wrong than just a bad superblock hash if
it could not be booted.

The fix in r350671 was to recompute the superblock check hash in a
place that I had missed earlier. I discovered the error when someone
reported getting superblock check hash errors when booting. But that
error did not cause their system to be unbootable for the reasons
that I explained in the previous paragraph.

If the filesystem started on 12-stable, then moving to 13 would not
have enabled superblock check hashes. They are only enabled when you
run fsck manually and explicitly say yes to the request to add superblock
check hashes. Running fsck -y will not add them, only when you run fsck
and explicitly respond yes to the superblock check hash addition request.
Filesystems created on 13 will get superblock check hashs. But if you
boot a 13 filesystem using a 12-stable kernel, they will be disabled and
left disabled even if you boot the filesystem on 13 again.

Thanks for pointing out the error on the fsck_ufs manual page. The first
backup superblock moved from 160 to 192 when the default block size was
raised from 16K to 32K. I have corrected the page in r350682.

	Kirk McKusick
Received on Wed Aug 07 2019 - 14:59:31 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:22 UTC