Re: EARLY_AP_STARTUP hangs during boot

From: Gary Jennejohn <gljennjohn_at_gmail.com>
Date: Sun, 31 Jul 2016 11:29:14 +0200
On Sat, 30 Jul 2016 12:03:59 -0700
John Baldwin <jhb_at_freebsd.org> wrote:

> On Saturday, July 30, 2016 09:44:22 AM Gary Jennejohn wrote:
> > On Fri, 29 Jul 2016 13:17:42 -0700
> > John Baldwin <jhb_at_freebsd.org> wrote:
> >   
> > > On Thursday, July 28, 2016 12:31:31 AM Gary Jennejohn wrote:  
> > > > Well, now I know that ULE is a prerequiste for EARLY_AP_STARTUP!  I
> > > > wasn't aware of that.  I prefer BSD and that's the scheduler I did
> > > > the first tests with.
> > > > 
> > > > But with the ULE scheduler the system comes up all the way.
> > > > 
> > > > It would be nice if the BSD scheduler could also be modified to
> > > > work with EARLY_AP_STARTUP.    
> > > 
> > > I wasn't able to reproduce your hang with 4BSD, but I think I see a
> > > possible problem.  Try this:
> > > 
> > > diff --git a/sys/kern/sched_4bsd.c b/sys/kern/sched_4bsd.c
> > > index 7de56b6..d53331a 100644
> > > --- a/sys/kern/sched_4bsd.c
> > > +++ b/sys/kern/sched_4bsd.c
> > > _at__at_ -327,7 +327,6 _at__at_ maybe_preempt(struct thread *td)
> > >  	 *  - The current thread has a higher (numerically lower) or
> > >  	 *    equivalent priority.  Note that this prevents curthread from
> > >  	 *    trying to preempt to itself.
> > > -	 *  - It is too early in the boot for context switches (cold is set).
> > >  	 *  - The current thread has an inhibitor set or is in the process of
> > >  	 *    exiting.  In this case, the current thread is about to switch
> > >  	 *    out anyways, so there's no point in preempting.  If we did,
> > > _at__at_ -348,7 +347,7 _at__at_ maybe_preempt(struct thread *td)
> > >  			("maybe_preempt: trying to run inhibited thread"));
> > >  	pri = td->td_priority;
> > >  	cpri = ctd->td_priority;
> > > -	if (panicstr != NULL || pri >= cpri || cold /* || dumping */ ||
> > > +	if (panicstr != NULL || pri >= cpri /* || dumping */ ||
> > >  	    TD_IS_INHIBITED(ctd))
> > >  		return (0);
> > >  #ifndef FULL_PREEMPTION
> > > _at__at_ -1127,7 +1126,7 _at__at_ forward_wakeup(int cpunum)
> > >  	if ((!forward_wakeup_enabled) ||
> > >  	     (forward_wakeup_use_mask == 0 && forward_wakeup_use_loop == 0))
> > >  		return (0);
> > > -	if (!smp_started || cold || panicstr)
> > > +	if (!smp_started || panicstr)
> > >  		return (0);
> > >  
> > >  	forward_wakeups_requested++;
> > >   
> > 
> > Thanks, but with this patch the kernel hangs in exactly the same
> > place as before - after the HPET output.
> > 
> > Maybe I'm missing some kernel option which ULE works around, or
> > something like that.  
> 
> Hmm, ok.  Please add KTR_RUNQ and KTR_SMP to the KTR masks, that is
> 'options KTR_COMPILE=(KTR_PROC|KTR_RUNQ|KTR_SMP)' and
> 'options KTR_MASK=(KTR_PROC|KTR_RUNQ|KTR_SMP)'
> 
> Please also add this patch (on top of the previous patch):
> 
> diff --git a/sys/kern/sched_4bsd.c b/sys/kern/sched_4bsd.c
> index 2973a23..bab2278 100644
> --- a/sys/kern/sched_4bsd.c
> +++ b/sys/kern/sched_4bsd.c
> _at__at_ -1278,6 +1278,8 _at__at_ sched_add(struct thread *td, int flags)
>         KASSERT(td->td_flags & TDF_INMEM,
>             ("sched_add: thread swapped out"));
>  
> +       CTR2(KTR_PROC, "sched_add: thread %d (%s)", td->td_tid,
> +           sched_tdname(td));
>         KTR_STATE2(KTR_SCHED, "thread", sched_tdname(td), "runq add",
>             "prio:%d", td->td_priority, KTR_ATTR_LINKED,
>             sched_tdname(curthread));
> diff --git a/sys/x86/x86/cpu_machdep.c b/sys/x86/x86/cpu_machdep.c
> index f07b97e..1f418f1 100644
> --- a/sys/x86/x86/cpu_machdep.c
> +++ b/sys/x86/x86/cpu_machdep.c
> _at__at_ -440,6 +440,7 _at__at_ cpu_idle_wakeup(int cpu)
>                 return (0);
>         if (*state == STATE_MWAIT)
>                 *state = STATE_RUNNING;
> +       CTR1(KTR_PROC, "cpu_idle_wakeup: wokeup CPU %d", cpu);
>         return (1);
>  }
> 
> (I haven't tried compiling it, you might have to add the sys/ktr.h
> header to cpu_machdep.c if it doesn't build.)
> 
> Hopefully we will get some better trace messages before it hangs
> with this added info.  The root issue seems to be that 4BSD is
> pinning thread0 to some other CPU (due to sched_bind that happens
> inside of bus_bind_intr() when the HPET driver pins IRQs to CPUs)
> and that other CPU isn't waking up to realize it needs to run thread0.
> 

It compiled with no changes needed.

Even though I set MAXCPU to a mere 2, the boot still hadn't
completed after 90 minutes and I broke it off.  I still have
the kernel, so I can try it another time when I have less need
for my FreeBSD box.

-- 
Gary Jennejohn
Received on Sun Jul 31 2016 - 07:29:20 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:07 UTC