Re: 13.0 failing to boot multiuser on one PC due to system utilities crashing during rc scipt

From: Konstantin Belousov <kostikbel_at_gmail.com>
Date: Sun, 11 Nov 2018 01:07:44 +0200
On Sat, Nov 10, 2018 at 05:27:09PM +0100, Guido Falsi wrote:
> On 10/11/18 13:08, Guido Falsi wrote:
> > Hi,
> > 
> > Today I was updating my home machines to recent head, r340303.
> > Previously I was running r339449.
> > 
> > I have a build machine where I build base packages (and also runs
> > poudriere). I updated that machine using packages I built successfully.
> > it is running fine and also successfully rebuilt a full ports package
> > set on the new head.
> > 
> > After that I upgraded, using the same package set, another machine, a PC
> > from around three years ago with an i5. After upgrade the kernel boots
> > fine but when running the rc script to go multiuser some system
> > utilities fail, especially zfs, making it impossible for the machine to
> > complete the boot process.
> > 
> > I have tested booting from the memstick snapshot images, I tested:
> > 
> > FreeBSD-13.0-CURRENT-amd64-20181107-r340239-memstick.img
> > FreeBSD-13.0-CURRENT-amd64-20181101-r339979-memstick.img
> > 
> > and both are also failing to go multiuser. The utility failing in this
> > case is fsck, which, like zfs before, dumps core.
> > 
> > I see a pattern where only disk related utilities crash.
> > 
> > The 12.0-BETA4 installation memstick works fine though.
> > 
> > So clearly something changed between r339449 and r340303 which causes
> > incompatibility with my hardware.
> > 
> > I'll to bisect things, but it will be a slow process.
> 
> I narrowed it down to r339895.
I somehow doubt that this is the case.

If you take post-r339895 kernel and start e.g. 11.2-RELEASE userspace
(untar the installation into jail to avoid reinstallation), does it
still demonstrate the behaviour ?

Also try to run pre-r339895 with the 12.0 userspace from e.g. 12.0-BETA4 
builds.

> 
> I'm not sure why it fails, it goes beyond my knowledge, the change looks
> unharmful, but clearly isn't.
Usually it means that the bisect went wrong and your environment failed
to cleanly isolate the change.

> 
> My impression is that the other conditions not moved inside the ifunc
> also play a role so such optimization is not possible on all systems.
> 
> > 
> > I have put dmesg and pciconf output here in case it could be useful:
> > 
> > https://people.freebsd.org/~madpilot/boot_fail/
This is haswell, right ?  It is exactly the same micro-arch as the machine
where I tested this series of changes.
Received on Sat Nov 10 2018 - 22:08:04 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:19 UTC