Re: proposed smp_rendezvous change

From: Max Laier <max_at_love2party.net> Date: Sun, 15 May 2011 12:09:13 -0400 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:14 UTC

On Sunday 15 May 2011 11:16:03 John Baldwin wrote:
> On 5/15/11 10:53 AM, Andriy Gapon wrote:
> > on 15/05/2011 10:12 Andriy Gapon said the following:
> >> on 14/05/2011 18:25 John Baldwin said the following:
> >>> Hmmm, so this is not actually sufficient.  NetApp ran into a very
> >>> similar race with virtual CPUs in BHyVe.  In their case because
> >>> virtual CPUs are threads that can be preempted, they have a chance at
> >>> a longer race.
> >>> 
> >>> The problem that they see is that even though the values have been
> >>> updated, the next CPU to start a rendezvous can clear
> >>> smp_rv_waiters[2] to zero before one of the other CPUs notices that it
> >>> has finished.
> >> 
> >> As a follow up to my previous question.  Have you noticed that in my
> >> patch no slave CPU actually waits/spins on smp_rv_waiters[2]?  It's
> >> always only master CPU (and under smp_ipi_mtx).
> > 
> > Here's a cleaner version of my approach to the fix.
> > This one does not remove the initial wait on smp_rv_waiters[0] in
> > smp_rendezvous_action() and thus does not renumber all smp_rv_waiters[]
> > members and thus hopefully should be clearer.
> > 
> > Index: sys/kern/subr_smp.c
> > ===================================================================
> > --- sys/kern/subr_smp.c	(revision 221943)
> > +++ sys/kern/subr_smp.c	(working copy)
> > _at__at_ -110,7 +110,7 _at__at_ static void (*volatile smp_rv_setup_func)(void *ar
> > 
> >   static void (*volatile smp_rv_action_func)(void *arg);
> >   static void (*volatile smp_rv_teardown_func)(void *arg);
> >   static void *volatile smp_rv_func_arg;
> > 
> > -static volatile int smp_rv_waiters[3];
> > +static volatile int smp_rv_waiters[4];
> > 
> >   /*
> >   
> >    * Shared mutex to restrict busywaits between smp_rendezvous() and
> > 
> > _at__at_ -338,11 +338,15 _at__at_ smp_rendezvous_action(void)
> > 
> >   	/* spin on exit rendezvous */
> >   	atomic_add_int(&smp_rv_waiters[2], 1);
> > 
> > -	if (local_teardown_func == smp_no_rendevous_barrier)
> > +	if (local_teardown_func == smp_no_rendevous_barrier) {
> > +		atomic_add_int(&smp_rv_waiters[3], 1);
> > 
> >                   return;
> > 
> > +	}
> > 
> >   	while (smp_rv_waiters[2]<  smp_rv_ncpus)
> >   	
> >   		cpu_spinwait();
> > 
> > +	atomic_add_int(&smp_rv_waiters[3], 1);
> > +
> > 
> >   	/* teardown function */
> >   	if (local_teardown_func != NULL)
> >   	
> >   		local_teardown_func(local_func_arg);
> > 
> > _at__at_ -377,6 +381,9 _at__at_ smp_rendezvous_cpus(cpumask_t map,
> > 
> >   	/* obtain rendezvous lock */
> >   	mtx_lock_spin(&smp_ipi_mtx);
> > 
> > +	while (smp_rv_waiters[3]<  smp_rv_ncpus)
> > +		cpu_spinwait();
> > +
> > 
> >   	/* set static function pointers */
> >   	smp_rv_ncpus = ncpus;
> >   	smp_rv_setup_func = setup_func;
> > 
> > _at__at_ -385,6 +392,7 _at__at_ smp_rendezvous_cpus(cpumask_t map,
> > 
> >   	smp_rv_func_arg = arg;
> >   	smp_rv_waiters[1] = 0;
> >   	smp_rv_waiters[2] = 0;
> > 
> > +	smp_rv_waiters[3] = 0;
> > 
> >   	atomic_store_rel_int(&smp_rv_waiters[0], 0);
> >   	
> >   	/* signal other processors, which will enter the IPI with interrupts
> >   	off */
> 
> Ahh, so the bump is after the change.  I do think this will still be ok
> and I probably just didn't explain it well to Neel.  I wonder though
> if the bump shouldn't happen until after the call of the local teardown
> function?

I don't think we ever intended to synchronize the local teardown part, and I 
believe that is the correct behavior for this API.

This version is sufficiently close to what I have, so I am resonably sure that 
it will work for us.  It seems, however, that if we move to check to after 
picking up the lock anyway, the generation approach has even less impact and I 
am starting to prefer that solution.

Andriy, is there any reason why you'd prefer your approach over the generation 
version?

Thanks,
  Max