On Wednesday, September 01, 2010 12:54:13 pm mdf_at_freebsd.org wrote: > On Wed, Sep 1, 2010 at 6:49 AM, John Baldwin <jhb_at_freebsd.org> wrote: > > On Tuesday, August 31, 2010 2:53:12 pm mdf_at_freebsd.org wrote: > >> On Tue, Aug 31, 2010 at 10:16 AM, <mdf_at_freebsd.org> wrote: > >> > I recorded the stack any time ts->ts_cpu was set and when a thread was > >> > migrated by sched_switch() I printed out the recorded info. Here's > >> > what I found: > >> > > >> > > >> > XXX bug 67957: moving 0xffffff003ff9b800 from 3 to 1 > >> > [1]: pin 0 state 4 move 3 -> 1 done by 0xffffff000cc44000: > >> > #0 0xffffffff802b36b4 at bug67957+0x84 > >> > #1 0xffffffff802b5dd4 at sched_affinity+0xd4 > >> > #2 0xffffffff8024a707 at cpuset_setthread+0x137 > >> > #3 0xffffffff8024aeae at cpuset_setaffinity+0x21e > >> > #4 0xffffffff804a82df at freebsd32_cpuset_setaffinity+0x4f > >> > #5 0xffffffff80295f49 at isi_syscall+0x99 > >> > #6 0xffffffff804a630e at ia32_syscall+0x1ce > >> > #7 0xffffffff8046dc60 at Xint0x80_syscall+0x60 > >> > [0]: pin 0 state 2 move 0 -> 3 done by 0xffffff000cc44000: > >> > #0 0xffffffff802b36b4 at bug67957+0x84 > >> > #1 0xffffffff802b4ad8 at sched_add+0xe8 > >> > #2 0xffffffff8029b96a at create_thread+0x34a > >> > #3 0xffffffff8029badc at kern_thr_new+0x8c > >> > #4 0xffffffff804a8912 at freebsd32_thr_new+0x122 > >> > #5 0xffffffff80295f49 at isi_syscall+0x99 > >> > #6 0xffffffff804a630e at ia32_syscall+0x1ce > >> > #7 0xffffffff8046dc60 at Xint0x80_syscall+0x60 > >> > > >> > So one thread in the process called cpuset_setaffinity(2), and another > >> > thread in the process was forcibly migrated by the IPI while returning > >> > from a syscall, while it had td_pinned set. > >> > > >> > Given this path, it seems reasonable to me to skip the migrate if we > >> > notice THREAD_CAN_MIGRATE is false. > >> > > >> > Opinions? My debug code is below. I'll try to write a short testcase > >> > that exhibits this bug. > >> > >> Just a few more thoughts on this. The check in sched_affinity for > >> THREAD_CAN_MIGRATE is racy. Since witness uses sched_pin, it's not > >> simple to take the THREAD lock around an increment of td_pinned. So > >> I'm looking for suggestions on the best way to fix this issue. My > >> thoughts: > >> > >> 1) add a check in sched_switch() for THREAD_CAN_MIGRATE > >> 2) have WITNESS not use sched_pin, and take the THREAD lock when > >> modifying td_pinned > >> 3) have the IPI_PREEMPT handler notice the thread is pinned (and not > >> trying to bind) and postpone the mi_switch, just like it postpones > >> when a thread is in a critical section. > >> > >> Except for the potential complexity of implementation, I think (2) is > >> the best solution. > > > > I don't like 2) only because you'd really need to fix all the other places > > that use sched_pin() to grab the thread lock. That would be a bit expensive. > > I think the problem is code that checks THREAD_CAN_MIGRATE() for running > > threads that aren't curthread. > > > > The sched_affinity() bug is probably my fault FWIW. I think something along > > the lines of 1) is the best approach. Maybe something like the patch below. > > It moves the CPU assignment out of sched_affinity() and moves it into > > sched_switch() instead so that it takes effect on the next context switch for > > this thread. That is the point at which the CPU binding actually takes effect > > anyway since sched_affinity() doesn't force an immediate context switch. This > > patch is relative to 7. The patch for 9 is nearly identical (just a fixup for > > ipi_cpu() vs ipi_selected()). > > > > Index: sched_ule.c > > =================================================================== > > --- sched_ule.c (revision 212065) > > +++ sched_ule.c (working copy) > > _at__at_ -1885,6 +1885,8 _at__at_ > > srqflag = (flags & SW_PREEMPT) ? > > SRQ_OURSELF|SRQ_YIELDING|SRQ_PREEMPTED : > > SRQ_OURSELF|SRQ_YIELDING; > > + if (THREAD_CAN_MIGRATE(td) && !THREAD_CAN_SCHED(td, ts->tscpu)) > > + ts->ts_cpu = sched_pickcpu(td, 0); > > if (ts->ts_cpu == cpuid) > > tdq_add(tdq, td, srqflag); > > else > > _at__at_ -2536,7 +2538,6 _at__at_ > > { > > #ifdef SMP > > struct td_sched *ts; > > - int cpu; > > > > THREAD_LOCK_ASSERT(td, MA_OWNED); > > ts = td->td_sched; > > _at__at_ -2550,17 +2551,13 _at__at_ > > if (!TD_IS_RUNNING(td)) > > return; > > td->td_flags |= TDF_NEEDRESCHED; > > - if (!THREAD_CAN_MIGRATE(td)) > > - return; > > /* > > - * Assign the new cpu and force a switch before returning to > > - * userspace. If the target thread is not running locally send > > - * an ipi to force the issue. > > + * Force a switch before returning to userspace. If the > > + * target thread is not running locally send an ipi to force > > + * the issue. > > */ > > - cpu = ts->ts_cpu; > > - ts->ts_cpu = sched_pickcpu(td, 0); > > - if (cpu != PCPU_GET(cpuid)) > > - ipi_selected(1 << cpu, IPI_PREEMPT); > > + if (td != curthread) > > + ipi_selected(1 << ts->ts_cpu, IPI_PREEMPT); > > #endif > > } > > I will test this patch out; thanks for the help! > > Two questions: > > 1) How does a thread get moved between CPUs when it's not running? I > see that we change the runqueue for non-running threads that are on a > runqueue. Does the code always check for THREAD_CAN_SCHED when adding > a sleeping thread to a runqueue on wakeup? Yes. That is done in sched_add() when it calls sched_pickcpu(). > 2) I assume sched_switch() runs a lot more often than sched_pin() or > sched_affinity(), so it would make sense to add any performance > penalty of increased work in either of those places than in the > scheduler. I suppose the two memory references for THREAD_CAN_MIGRATE > and THREAD_CAN_SCHED won't be that expensive. Well, sched_pin() is actually used fairly often, and importantly, it is used under the assumption that it is very cheap (specifically, cheaper than critical_enter() and critical_exit()). The THREAD_CAN_MIGRATE() check is very cheap for sched_switch() (just check if a private per-thread word is non-zero) and the slightly more expensive (but still fairly cheap) THREAD_CAN_SCHED() is only tested for pinned threads. Also note that this path only happens for context switches that preempt a running thread (either a user preemption in userret(), or a kernel preemption for an ithread). -- John BaldwinReceived on Wed Sep 01 2010 - 15:25:02 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:06 UTC