Re: 13.0 failing to boot multiuser on one PC due to system utilities crashing during rc scipt

From: Guido Falsi <mad_at_madpilot.net>
Date: Mon, 12 Nov 2018 11:52:25 +0100
On 11/11/18 22:14, Konstantin Belousov wrote:
> On Sun, Nov 11, 2018 at 08:44:24PM +0100, Guido Falsi wrote:
>> On 11/11/18 11:10, Guido Falsi wrote:
>>> On 11/11/18 00:07, Konstantin Belousov wrote:
>> I performed these tests. I downloaded the 12.0-BETA4 and 11.2
>> installation images and replaced the kernels in there. This was faster
>> than working with jails on a crippled system.
>>
>> r339895 kernel on 11.2-RELEASE causes fsck (launched by rc) to dump core
>> and this stops the boot procedure.
>>
>> r339894 kernel on 12.0-BETA4 works fine.
> 
> Ok, let try to find some reason.

The requested files are accessible here:

https://www.madpilot.net/cloud/s/Q9DAGrntnneomSs

> 
> - When you build your kernels, you do not use any cpu-specific optimization
>   flags, do you ?  More, you follow the standard build procedure and your
>   make.conf and src.conf are empty, right ?

At the start I did have some optimizations, but I disabled them all.

I'm building with 'make -j buildkernel'. I usually enable META_MODE, but
I also disabled that and even wiped out the contents of /usr/obj
multiple times to make sure I was getting a clean build.

> - Do you preload a microcode update from the loader ?

At present no, I load it later via rc scripts.

This is something I want to test though, I'll report later if it changes
anything.

> - Show the output of sysctl vm.pmap.
> - Show verbose dmesg from the boot of the problematic kernel.
>   You posted non-verbose dmesg for 12.0-BETA4.

Posted at the link above.

> - Enter ddb, when booted the problematic kernel.  Do
>   db> x/x cpu_stdext_feature

cpu_stdext_feature:     281

>   db> x/x cpu_stdext_feature+4

cpu_stdext_feature2:    0

> - From the same ddb session, disassemble e.g. cpu_set_user_tls().
>   You could paste me whole disassembling, but really I want to know
>   the single line with the call to set_pcb_flagsXXXX, it should be
>   either set_pcb_flags_raw or set_pcb_flags_fsgsbase.  To disassemble
>   in ddb, do
>   db> x/i cpu_set_user_tls
>   and then press <enter> more to get next and next instructions.
>   (I want the disassembly from ddb and not from gdb/kgdb).

cpu_set_user_tls+0x2d:  call    set_pcb_flags_raw


The full ddb session capture is posted at the link above.

> - Try the following patch.
> 

The patch does produce a working kernel. In fact I'm running that kernel
now.

I've also added the broken kernel with it's kernel.debug file as a txz
archive in the URL posted above.

Hope this helps. Thanks for your time and effort!

-- 
Guido Falsi <mad_at_madpilot.net>
Received on Mon Nov 12 2018 - 09:52:30 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:19 UTC