Re: proposed smp_rendezvous change

From: John Baldwin <jhb_at_freebsd.org> Date: Thu, 19 May 2011 08:07:34 -0400 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:14 UTC

On Wednesday, May 18, 2011 4:27:50 pm Max Laier wrote:
> On 05/17/2011 01:35 PM, John Baldwin wrote:
> ...
> > Yeah, I already have a patch to do that, but hadn't added atomic ops to
> > critical_enter() and critical_exit().  But it also wasn't as fancy in the
> > critical_exit() case.  Here is what I have and I think it might actually
> > be ok (it doesn't use an atomic read and clear, but I think it is safe).
> > Hmm, actually, it will need to use the read and clear:
> 
> Looks good to me.  I was slightly surprised by this:
> 
> > Index: kern/kern_synch.c
> > ===================================================================
> > --- kern/kern_synch.c	(revision 222024)
> > +++ kern/kern_synch.c	(working copy)
> > _at__at_ -400,9 +400,7 _at__at_
> >   	if (!TD_ON_LOCK(td)&&  !TD_IS_RUNNING(td))
> >   		mtx_assert(&Giant, MA_NOTOWNED);
> >   #endif
> > -	KASSERT(td->td_critnest == 1 || (td->td_critnest == 2&&
> > -	    (td->td_owepreempt)&&  (flags&  SW_INVOL) != 0&&
> > -	    newtd == NULL) || panicstr,
> > +	KASSERT(td->td_critnest == 1 || panicstr,
> >   	    ("mi_switch: switch in a critical section"));
> >   	KASSERT((flags&  (SW_INVOL | SW_VOL)) != 0,
> >   	    ("mi_switch: switch must be voluntary or involuntary"));
> 
> part of the patch.  But that is in fact correct and much more expressive 
> and safe than the version we had before.

Actually, this is something that was made obsolete by ups's change to 
critical_exit() several years ago.

However, after sleeping on this last night, I think that critical_enter() and 
critical_exit() are a very hot path and that it may be worth adding some 
additional complexity to keep them as cheap as possible.  Also, your original 
patch for rm locks is actually correct.  They will not miss a preemption 
because rm_cleanIPI will not schedule a thread to run while it holds the spin 
lock.  However, I'd like to be a bit more future proof so that other IPI 
handler authors don't have to deal with this obscure race.  Given that, I've 
generalized your fix and moved it up into the SMP rendezvous code itself along 
with a big comment to explain why it works and a KASSERT().  I think this is 
also MFC'able back to 7 as well:

Index: kern/subr_smp.c
===================================================================
--- kern/subr_smp.c	(revision 222032)
+++ kern/subr_smp.c	(working copy)
_at__at_ -316,6 +316,9 _at__at_
 	void (*local_action_func)(void*);
 	void (*local_teardown_func)(void*);
 	int generation;
+#ifdef INVARIANTS
+	int owepreempt;
+#endif

 	/* Ensure we have up-to-date values. */
 	atomic_add_acq_int(&smp_rv_waiters[0], 1);
_at__at_ -330,6 +333,33 _at__at_
 	generation = smp_rv_generation;

 	/*
+	 * Use a nested critical section to prevent any preemptions
+	 * from occurring during a rendezvous action routine.
+	 * Specifically, if a rendezvous handler is invoked via an IPI
+	 * and the interrupted thread was in the critical_exit()
+	 * function after setting td_critnest to 0 but before
+	 * performing a deferred preemption, this routine can be
+	 * invoked with td_critnest set to 0 and td_owepreempt true.
+	 * In that case, a critical_exit() during the rendezvous
+	 * action would trigger a preemption which is not permitted in
+	 * a rendezvous action.  To fix this, wrap all of the
+	 * rendezvous action handlers in a critical section.  We
+	 * cannot use a regular critical section however as having
+	 * critical_exit() preempt from this routine would also be
+	 * problematic (the preemption must not occur before the IPI
+	 * has been acknowleged via an EOI).  Instead, we
+	 * intentionally ignore td_owepreempt when leaving the
+	 * critical setion.  This should be harmless because we do not
+	 * permit rendezvous action routines to schedule threads, and
+	 * thus td_owepreempt should never transition from 0 to 1
+	 * during this routine.
+	 */
+	td->td_critnest++;
+#ifdef INVARIANTS
+	owepreempt = td->td_owepreempt;
+#endif
+	
+	/*
 	 * If requested, run a setup function before the main action
 	 * function.  Ensure all CPUs have completed the setup
 	 * function before moving on to the action function.
_at__at_ -362,14 +392,18 _at__at_
 	 */
 	MPASS(generation == smp_rv_generation);
 	atomic_add_int(&smp_rv_waiters[2], 1);
-	if (local_teardown_func == smp_no_rendevous_barrier)
-                return;
-	while (smp_rv_waiters[2] < smp_rv_ncpus &&
-	    generation == smp_rv_generation)
-		cpu_spinwait();
+	if (local_teardown_func != smp_no_rendevous_barrier) {
+		while (smp_rv_waiters[2] < smp_rv_ncpus &&
+		    generation == smp_rv_generation)
+			cpu_spinwait();

-	if (local_teardown_func != NULL)
-		local_teardown_func(local_func_arg);
+		if (local_teardown_func != NULL)
+			local_teardown_func(local_func_arg);
+	}
+
+	td->td_critnest--;
+	KASSERT(owepreempt == td->td_owepreempt,
+	    ("rendezvous action changed td_owepreempt"));
 }

 void
Index: kern/kern_synch.c
===================================================================
--- kern/kern_synch.c	(revision 222024)
+++ kern/kern_synch.c	(working copy)
_at__at_ -400,9 +400,7 _at__at_
 	if (!TD_ON_LOCK(td) && !TD_IS_RUNNING(td))
 		mtx_assert(&Giant, MA_NOTOWNED);
 #endif
-	KASSERT(td->td_critnest == 1 || (td->td_critnest == 2 &&
-	    (td->td_owepreempt) && (flags & SW_INVOL) != 0 &&
-	    newtd == NULL) || panicstr,
+	KASSERT(td->td_critnest == 1 || panicstr,
 	    ("mi_switch: switch in a critical section"));
 	KASSERT((flags & (SW_INVOL | SW_VOL)) != 0,
 	    ("mi_switch: switch must be voluntary or involuntary"));

-- 
John Baldwin