Re: 13.0 failing to boot multiuser on one PC due to system utilities crashing during rc scipt

From: Konstantin Belousov <kostikbel_at_gmail.com>
Date: Sun, 11 Nov 2018 23:14:34 +0200
On Sun, Nov 11, 2018 at 08:44:24PM +0100, Guido Falsi wrote:
> On 11/11/18 11:10, Guido Falsi wrote:
> > On 11/11/18 00:07, Konstantin Belousov wrote:
> >> On Sat, Nov 10, 2018 at 05:27:09PM +0100, Guido Falsi wrote:
> >>> On 10/11/18 13:08, Guido Falsi wrote:
> >>>> I'll to bisect things, but it will be a slow process.
> >>>
> >>> I narrowed it down to r339895.
> >> I somehow doubt that this is the case.
> >>
> > 
> > I did not mean to accuse you. Instead thanks for this reply and the
> > suggestions. Really appreciated.
> > 
> > I simply found out that removing that commit from my sources gives me a
> > stable system and reported such finding.
> > 
> > I understand that the actual cause could be an interaction with other
> > code and am ready to review my findings.
> > 
> >> If you take post-r339895 kernel and start e.g. 11.2-RELEASE userspace
> >> (untar the installation into jail to avoid reinstallation), does it
> >> still demonstrate the behaviour ?
> >>
> >> Also try to run pre-r339895 with the 12.0 userspace from e.g. 12.0-BETA4 
> >> builds.
> > 
> > I'll perform such tests. Please allow me some time to report back what I
> > get.
> 
> I performed these tests. I downloaded the 12.0-BETA4 and 11.2
> installation images and replaced the kernels in there. This was faster
> than working with jails on a crippled system.
> 
> r339895 kernel on 11.2-RELEASE causes fsck (launched by rc) to dump core
> and this stops the boot procedure.
> 
> r339894 kernel on 12.0-BETA4 works fine.

Ok, let try to find some reason.

- When you build your kernels, you do not use any cpu-specific optimization
  flags, do you ?  More, you follow the standard build procedure and your
  make.conf and src.conf are empty, right ?
- Do you preload a microcode update from the loader ?
- Show the output of sysctl vm.pmap.
- Show verbose dmesg from the boot of the problematic kernel.
  You posted non-verbose dmesg for 12.0-BETA4.
- Enter ddb, when booted the problematic kernel.  Do
  db> x/x cpu_stdext_feature
  db> x/x cpu_stdext_feature+4
- From the same ddb session, disassemble e.g. cpu_set_user_tls().
  You could paste me whole disassembling, but really I want to know
  the single line with the call to set_pcb_flagsXXXX, it should be
  either set_pcb_flags_raw or set_pcb_flags_fsgsbase.  To disassemble
  in ddb, do
  db> x/i cpu_set_user_tls
  and then press <enter> more to get next and next instructions.
  (I want the disassembly from ddb and not from gdb/kgdb).
- Try the following patch.

diff --git a/sys/amd64/amd64/machdep.c b/sys/amd64/amd64/machdep.c
index 6e36ae97523..8dafd4b4756 100644
--- a/sys/amd64/amd64/machdep.c
+++ b/sys/amd64/amd64/machdep.c
_at__at_ -2627,8 +2627,8 _at__at_ set_pcb_flags_raw(struct pcb *pcb, const u_int flags)
  * the PCB_FULL_IRET flag is set.  We disable interrupts to sync with
  * context switches.
  */
-static void
-set_pcb_flags_fsgsbase(struct pcb *pcb, const u_int flags)
+void
+set_pcb_flags(struct pcb *pcb, const u_int flags)
 {
 	register_t r;
 
_at__at_ -2649,13 +2649,6 _at__at_ set_pcb_flags_fsgsbase(struct pcb *pcb, const u_int flags)
 	}
 }
 
-DEFINE_IFUNC(, void, set_pcb_flags, (struct pcb *, const u_int), static)
-{
-
-	return ((cpu_stdext_feature & CPUID_STDEXT_FSGSBASE) != 0 ?
-	    set_pcb_flags_fsgsbase : set_pcb_flags_raw);
-}
-
 void
 clear_pcb_flags(struct pcb *pcb, const u_int flags)
 {
Received on Sun Nov 11 2018 - 20:14:50 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:19 UTC