Re: sched_pin() bug in SCHED_ULE

From: John Baldwin <jhb_at_freebsd.org>
Date: Wed, 1 Sep 2010 13:19:12 -0400
On Wednesday, September 01, 2010 12:54:13 pm mdf_at_freebsd.org wrote:
> On Wed, Sep 1, 2010 at 6:49 AM, John Baldwin <jhb_at_freebsd.org> wrote:
> > On Tuesday, August 31, 2010 2:53:12 pm mdf_at_freebsd.org wrote:
> >> On Tue, Aug 31, 2010 at 10:16 AM,  <mdf_at_freebsd.org> wrote:
> >> > I recorded the stack any time ts->ts_cpu was set and when a thread was
> >> > migrated by sched_switch() I printed out the recorded info.  Here's
> >> > what I found:
> >> >
> >> >
> >> > XXX bug 67957: moving 0xffffff003ff9b800 from 3 to 1
> >> > [1]: pin 0 state 4 move 3 -> 1 done by 0xffffff000cc44000:
> >> > #0 0xffffffff802b36b4 at bug67957+0x84
> >> > #1 0xffffffff802b5dd4 at sched_affinity+0xd4
> >> > #2 0xffffffff8024a707 at cpuset_setthread+0x137
> >> > #3 0xffffffff8024aeae at cpuset_setaffinity+0x21e
> >> > #4 0xffffffff804a82df at freebsd32_cpuset_setaffinity+0x4f
> >> > #5 0xffffffff80295f49 at isi_syscall+0x99
> >> > #6 0xffffffff804a630e at ia32_syscall+0x1ce
> >> > #7 0xffffffff8046dc60 at Xint0x80_syscall+0x60
> >> > [0]: pin 0 state 2 move 0 -> 3 done by 0xffffff000cc44000:
> >> > #0 0xffffffff802b36b4 at bug67957+0x84
> >> > #1 0xffffffff802b4ad8 at sched_add+0xe8
> >> > #2 0xffffffff8029b96a at create_thread+0x34a
> >> > #3 0xffffffff8029badc at kern_thr_new+0x8c
> >> > #4 0xffffffff804a8912 at freebsd32_thr_new+0x122
> >> > #5 0xffffffff80295f49 at isi_syscall+0x99
> >> > #6 0xffffffff804a630e at ia32_syscall+0x1ce
> >> > #7 0xffffffff8046dc60 at Xint0x80_syscall+0x60
> >> >
> >> > So one thread in the process called cpuset_setaffinity(2), and another
> >> > thread in the process was forcibly migrated by the IPI while returning
> >> > from a syscall, while it had td_pinned set.
> >> >
> >> > Given this path, it seems reasonable to me to skip the migrate if we
> >> > notice THREAD_CAN_MIGRATE is false.
> >> >
> >> > Opinions?  My debug code is below.  I'll try to write a short testcase
> >> > that exhibits this bug.
> >>
> >> Just a few more thoughts on this.  The check in sched_affinity for
> >> THREAD_CAN_MIGRATE is racy.  Since witness uses sched_pin, it's not
> >> simple to take the THREAD lock around an increment of td_pinned.  So
> >> I'm looking for suggestions on the best way to fix this issue.  My
> >> thoughts:
> >>
> >> 1) add a check in sched_switch() for THREAD_CAN_MIGRATE
> >> 2) have WITNESS not use sched_pin, and take the THREAD lock when
> >> modifying td_pinned
> >> 3) have the IPI_PREEMPT handler notice the thread is pinned (and not
> >> trying to bind) and postpone the mi_switch, just like it postpones
> >> when a thread is in a critical section.
> >>
> >> Except for the potential complexity of implementation, I think (2) is
> >> the best solution.
> >
> > I don't like 2) only because you'd really need to fix all the other places
> > that use sched_pin() to grab the thread lock.  That would be a bit expensive.
> > I think the problem is code that checks THREAD_CAN_MIGRATE() for running
> > threads that aren't curthread.
> >
> > The sched_affinity() bug is probably my fault FWIW.  I think something along
> > the lines of 1) is the best approach.  Maybe something like the patch below.
> > It moves the CPU assignment out of sched_affinity() and moves it into
> > sched_switch() instead so that it takes effect on the next context switch for
> > this thread.  That is the point at which the CPU binding actually takes effect
> > anyway since sched_affinity() doesn't force an immediate context switch.  This
> > patch is relative to 7.  The patch for 9 is nearly identical (just a fixup for
> > ipi_cpu() vs ipi_selected()).
> >
> > Index: sched_ule.c
> > ===================================================================
> > --- sched_ule.c (revision 212065)
> > +++ sched_ule.c (working copy)
> > _at__at_ -1885,6 +1885,8 _at__at_
> >                srqflag = (flags & SW_PREEMPT) ?
> >                    SRQ_OURSELF|SRQ_YIELDING|SRQ_PREEMPTED :
> >                    SRQ_OURSELF|SRQ_YIELDING;
> > +               if (THREAD_CAN_MIGRATE(td) && !THREAD_CAN_SCHED(td, ts->tscpu))
> > +                       ts->ts_cpu = sched_pickcpu(td, 0);
> >                if (ts->ts_cpu == cpuid)
> >                        tdq_add(tdq, td, srqflag);
> >                else
> > _at__at_ -2536,7 +2538,6 _at__at_
> >  {
> >  #ifdef SMP
> >        struct td_sched *ts;
> > -       int cpu;
> >
> >        THREAD_LOCK_ASSERT(td, MA_OWNED);
> >        ts = td->td_sched;
> > _at__at_ -2550,17 +2551,13 _at__at_
> >        if (!TD_IS_RUNNING(td))
> >                return;
> >        td->td_flags |= TDF_NEEDRESCHED;
> > -       if (!THREAD_CAN_MIGRATE(td))
> > -               return;
> >        /*
> > -        * Assign the new cpu and force a switch before returning to
> > -        * userspace.  If the target thread is not running locally send
> > -        * an ipi to force the issue.
> > +        * Force a switch before returning to userspace.  If the
> > +        * target thread is not running locally send an ipi to force
> > +        * the issue.
> >         */
> > -       cpu = ts->ts_cpu;
> > -       ts->ts_cpu = sched_pickcpu(td, 0);
> > -       if (cpu != PCPU_GET(cpuid))
> > -               ipi_selected(1 << cpu, IPI_PREEMPT);
> > +       if (td != curthread)
> > +               ipi_selected(1 << ts->ts_cpu, IPI_PREEMPT);
> >  #endif
> >  }
> 
> I will test this patch out; thanks for the help!
> 
> Two questions:
> 
> 1) How does a thread get moved between CPUs when it's not running?  I
> see that we change the runqueue for non-running threads that are on a
> runqueue.  Does the code always check for THREAD_CAN_SCHED when adding
> a sleeping thread to a runqueue on wakeup?

Yes.  That is done in sched_add() when it calls sched_pickcpu().

> 2) I assume sched_switch() runs a lot more often than sched_pin() or
> sched_affinity(), so it would make sense to add any performance
> penalty of increased work in either of those places than in the
> scheduler.  I suppose the two memory references for THREAD_CAN_MIGRATE
> and THREAD_CAN_SCHED won't be that expensive.

Well, sched_pin() is actually used fairly often, and importantly, it is
used under the assumption that it is very cheap (specifically, cheaper than
critical_enter() and critical_exit()).

The THREAD_CAN_MIGRATE() check is very cheap for sched_switch() (just check
if a private per-thread word is non-zero) and the slightly more expensive
(but still fairly cheap) THREAD_CAN_SCHED() is only tested for pinned
threads.  Also note that this path only happens for context switches that
preempt a running thread (either a user preemption in userret(), or a kernel
preemption for an ithread).

-- 
John Baldwin
Received on Wed Sep 01 2010 - 15:25:02 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:06 UTC