Re: sched_4bsd startup crash trying to run a bound thread on an AP that hasn't started

From: Attilio Rao <attilio_at_freebsd.org>
Date: Wed, 6 Apr 2011 13:21:04 -0400
2011/4/6 Ryan Stone <rysto32_at_gmail.com>:
> On Wed, Apr 6, 2011 at 8:36 AM, John Baldwin <jhb_at_freebsd.org> wrote:
>> Hummm.  Patching 4BSD to use the same route as ULE may be the best solution
>> for now if that is easiest.  Alternatively, you could change 4BSD's
>> sched_add() to not try to kick other CPUs until smp_started is true.
>
> At first I thought that it was a consequence of the way it does CPU
> affinity, but now I see that it shortcuts if smp_started is not true.
> How about something like the following for 4BSD?
>
> --- sched_4bsd.c        (revision 220222)
> +++ sched_4bsd.c        (working copy)
> _at__at_ -1242,14 +1242,14 _at__at_
>        }
>        TD_SET_RUNQ(td);
>
> -       if (td->td_pinned != 0) {
> +       if (smp_started && td->td_pinned != 0) {
>                cpu = td->td_lastcpu;
>                ts->ts_runq = &runq_pcpu[cpu];
>                single_cpu = 1;
>                CTR3(KTR_RUNQ,
>                    "sched_add: Put td_sched:%p(td:%p) on cpu%d runq", ts, td,
>                    cpu);
> -       } else if (td->td_flags & TDF_BOUND) {
> +       } else if (smp_started && (td->td_flags & TDF_BOUND)) {
>                /* Find CPU from bound runq. */
>                KASSERT(SKE_RUNQ_PCPU(ts),
>                    ("sched_add: bound td_sched not on cpu runq"));
> _at__at_ -1258,7 +1258,7 _at__at_
>                CTR3(KTR_RUNQ,
>                    "sched_add: Put td_sched:%p(td:%p) on cpu%d runq", ts, td,
>                    cpu);
> -       } else if (ts->ts_flags & TSF_AFFINITY) {
> +       } else if (smp_started && (ts->ts_flags & TSF_AFFINITY)) {
>                /* Find a valid CPU for our cpuset */
>                cpu = sched_pickcpu(td);
>                ts->ts_runq = &runq_pcpu[cpu];
>
> The flow control is a bit awkward because of the multiple
> affinity/bound cpu cases.  If somebody prefers the code to be
> structured differently I'd be open to suggestions.

That is more or less what ULE does -- in ULE it is simpler because it
goes via sched_pickcpu(), which still returns always CPU0 if APs still
didn't kick off.

I would also add a comment on top explaining the check, eventually,
but otherwise looks fine.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
Received on Wed Apr 06 2011 - 15:21:05 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:13 UTC