On Friday 13 May 2011 09:43:25 Andriy Gapon wrote: > This is a change in vein of what I've been doing in the xcpu branch and > it's supposed to fix the issue by the recent commit that (probably > unintentionally) stress-tests smp_rendezvous in TSC code. > > Non-essential changes: > - ditch initial, and in my opinion useless, pre-setup rendezvous in > smp_rendezvous_action() > - shift smp_rv_waiters indexes by one because of the above > > Essential changes (the fix): > - re-use freed smp_rv_waiters[2] to indicate that a slave/target is really > fully done with rendezvous (i.e. it's not going to access any members of > smp_rv_* pseudo-structure) > - spin on smp_rv_waiters[2] upon _entry_ to smp_rendezvous_cpus() to not > re-use the smp_rv_* pseudo-structure too early > > Index: sys/kern/subr_smp.c > =================================================================== > --- sys/kern/subr_smp.c (revision 221835) > +++ sys/kern/subr_smp.c (working copy) > _at__at_ -316,19 +316,14 _at__at_ > void (*local_action_func)(void*) = smp_rv_action_func; > void (*local_teardown_func)(void*) = smp_rv_teardown_func; > You really need a read/load with a memory fence here in order to make sure you get up to date values on all CPUs. > - /* Ensure we have up-to-date values. */ > - atomic_add_acq_int(&smp_rv_waiters[0], 1); > - while (smp_rv_waiters[0] < smp_rv_ncpus) > - cpu_spinwait(); > - > /* setup function */ > if (local_setup_func != smp_no_rendevous_barrier) { > if (smp_rv_setup_func != NULL) > smp_rv_setup_func(smp_rv_func_arg); > > /* spin on entry rendezvous */ > - atomic_add_int(&smp_rv_waiters[1], 1); > - while (smp_rv_waiters[1] < smp_rv_ncpus) > + atomic_add_int(&smp_rv_waiters[0], 1); > + while (smp_rv_waiters[0] < smp_rv_ncpus) > cpu_spinwait(); > } > > _at__at_ -337,12 +332,16 _at__at_ > local_action_func(local_func_arg); > > /* spin on exit rendezvous */ > - atomic_add_int(&smp_rv_waiters[2], 1); > - if (local_teardown_func == smp_no_rendevous_barrier) > + atomic_add_int(&smp_rv_waiters[1], 1); > + if (local_teardown_func == smp_no_rendevous_barrier) { > + atomic_add_int(&smp_rv_waiters[2], 1); > return; > - while (smp_rv_waiters[2] < smp_rv_ncpus) > + } > + while (smp_rv_waiters[1] < smp_rv_ncpus) > cpu_spinwait(); > > + atomic_add_int(&smp_rv_waiters[2], 1); > + > /* teardown function */ > if (local_teardown_func != NULL) > local_teardown_func(local_func_arg); > _at__at_ -377,6 +376,10 _at__at_ > /* obtain rendezvous lock */ > mtx_lock_spin(&smp_ipi_mtx); > > + /* Wait for any previous unwaited rendezvous to finish. */ > + while (atomic_load_acq_int(&smp_rv_waiters[2]) < ncpus) > + cpu_spinwait(); > + This is not the ncpus you are looking for. I have a patch of my own, that might be overdoing it ... but it attached nonetheless ... The rmlock is a separate issue, but what lead me to discover the problem with the rendevouz. > /* set static function pointers */ > smp_rv_ncpus = ncpus; > smp_rv_setup_func = setup_func; > _at__at_ -395,7 +398,7 _at__at_ > smp_rendezvous_action(); > > if (teardown_func == smp_no_rendevous_barrier) > - while (atomic_load_acq_int(&smp_rv_waiters[2]) < ncpus) > + while (atomic_load_acq_int(&smp_rv_waiters[1]) < ncpus) > cpu_spinwait(); > > /* release lock */
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:14 UTC