Re: EARLY_AP_STARTUP hangs during boot

From: John Baldwin <jhb_at_freebsd.org>
Date: Sat, 30 Jul 2016 12:03:59 -0700
On Saturday, July 30, 2016 09:44:22 AM Gary Jennejohn wrote:
> On Fri, 29 Jul 2016 13:17:42 -0700
> John Baldwin <jhb_at_freebsd.org> wrote:
> 
> > On Thursday, July 28, 2016 12:31:31 AM Gary Jennejohn wrote:
> > > Well, now I know that ULE is a prerequiste for EARLY_AP_STARTUP!  I
> > > wasn't aware of that.  I prefer BSD and that's the scheduler I did
> > > the first tests with.
> > > 
> > > But with the ULE scheduler the system comes up all the way.
> > > 
> > > It would be nice if the BSD scheduler could also be modified to
> > > work with EARLY_AP_STARTUP.  
> > 
> > I wasn't able to reproduce your hang with 4BSD, but I think I see a
> > possible problem.  Try this:
> > 
> > diff --git a/sys/kern/sched_4bsd.c b/sys/kern/sched_4bsd.c
> > index 7de56b6..d53331a 100644
> > --- a/sys/kern/sched_4bsd.c
> > +++ b/sys/kern/sched_4bsd.c
> > _at__at_ -327,7 +327,6 _at__at_ maybe_preempt(struct thread *td)
> >  	 *  - The current thread has a higher (numerically lower) or
> >  	 *    equivalent priority.  Note that this prevents curthread from
> >  	 *    trying to preempt to itself.
> > -	 *  - It is too early in the boot for context switches (cold is set).
> >  	 *  - The current thread has an inhibitor set or is in the process of
> >  	 *    exiting.  In this case, the current thread is about to switch
> >  	 *    out anyways, so there's no point in preempting.  If we did,
> > _at__at_ -348,7 +347,7 _at__at_ maybe_preempt(struct thread *td)
> >  			("maybe_preempt: trying to run inhibited thread"));
> >  	pri = td->td_priority;
> >  	cpri = ctd->td_priority;
> > -	if (panicstr != NULL || pri >= cpri || cold /* || dumping */ ||
> > +	if (panicstr != NULL || pri >= cpri /* || dumping */ ||
> >  	    TD_IS_INHIBITED(ctd))
> >  		return (0);
> >  #ifndef FULL_PREEMPTION
> > _at__at_ -1127,7 +1126,7 _at__at_ forward_wakeup(int cpunum)
> >  	if ((!forward_wakeup_enabled) ||
> >  	     (forward_wakeup_use_mask == 0 && forward_wakeup_use_loop == 0))
> >  		return (0);
> > -	if (!smp_started || cold || panicstr)
> > +	if (!smp_started || panicstr)
> >  		return (0);
> >  
> >  	forward_wakeups_requested++;
> > 
> 
> Thanks, but with this patch the kernel hangs in exactly the same
> place as before - after the HPET output.
> 
> Maybe I'm missing some kernel option which ULE works around, or
> something like that.

Hmm, ok.  Please add KTR_RUNQ and KTR_SMP to the KTR masks, that is
'options KTR_COMPILE=(KTR_PROC|KTR_RUNQ|KTR_SMP)' and
'options KTR_MASK=(KTR_PROC|KTR_RUNQ|KTR_SMP)'

Please also add this patch (on top of the previous patch):

diff --git a/sys/kern/sched_4bsd.c b/sys/kern/sched_4bsd.c
index 2973a23..bab2278 100644
--- a/sys/kern/sched_4bsd.c
+++ b/sys/kern/sched_4bsd.c
_at__at_ -1278,6 +1278,8 _at__at_ sched_add(struct thread *td, int flags)
        KASSERT(td->td_flags & TDF_INMEM,
            ("sched_add: thread swapped out"));
 
+       CTR2(KTR_PROC, "sched_add: thread %d (%s)", td->td_tid,
+           sched_tdname(td));
        KTR_STATE2(KTR_SCHED, "thread", sched_tdname(td), "runq add",
            "prio:%d", td->td_priority, KTR_ATTR_LINKED,
            sched_tdname(curthread));
diff --git a/sys/x86/x86/cpu_machdep.c b/sys/x86/x86/cpu_machdep.c
index f07b97e..1f418f1 100644
--- a/sys/x86/x86/cpu_machdep.c
+++ b/sys/x86/x86/cpu_machdep.c
_at__at_ -440,6 +440,7 _at__at_ cpu_idle_wakeup(int cpu)
                return (0);
        if (*state == STATE_MWAIT)
                *state = STATE_RUNNING;
+       CTR1(KTR_PROC, "cpu_idle_wakeup: wokeup CPU %d", cpu);
        return (1);
 }

(I haven't tried compiling it, you might have to add the sys/ktr.h
header to cpu_machdep.c if it doesn't build.)

Hopefully we will get some better trace messages before it hangs
with this added info.  The root issue seems to be that 4BSD is
pinning thread0 to some other CPU (due to sched_bind that happens
inside of bus_bind_intr() when the HPET driver pins IRQs to CPUs)
and that other CPU isn't waking up to realize it needs to run thread0.

-- 
John Baldwin
Received on Sat Jul 30 2016 - 17:14:12 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:07 UTC