smp_rendezvous runs with interrupts and preemption enabled on unicore systems

From: Ryan Stone <rysto32_at_gmail.com>
Date: Fri, 28 Oct 2011 11:37:14 -0400
I'm seeing issues on a unicore systems running a derivative of FreeBSD
8.2-RELEASE if something calls mem_range_attr_set.  It turns out that
the root cause is a bug in smp_rendezvous_cpus.  The first part of
smp_rendezvous_cpus attempts to short-circuit the non-SMP case(note
that smp_started is never set to 1 on a unicore system):

	if (!smp_started) {
		if (setup_func != NULL)
			setup_func(arg);
		if (action_func != NULL)
			action_func(arg);
		if (teardown_func != NULL)
			teardown_func(arg);
		return;
	}

The problem is that this runs with interrupts enabled, outside of a
critical section.  My system runs with device_polling enabled with hz
set to 2500, so its quite easy to wedge the system by having a thread
run mem_range_attr_set.  That has to do a smp_rendezvous, and if a
timer interrupt happens to go off half-way through the action_func and
preempt this thread, the system ends up deadlocked(although once it's
wedged, typing at the serial console stands a good chance of unwedging
the system.  Go figure).

I know that smp_rendezvous was reworked substantially on HEAD, but by
inspection it looks like the bug is still present, as the
short-circuit behaviour is still there.

I am not entirely sure of the best way to fix this.  Is it as simple
as doing a spinlock_enter before setup_func and a spinlock_exit after
teardown_func?  It seems to boot fine, but I'm not at all confident
that I understand the nuances of smp_rendezvous to be sure that there
aren't side effects that I don't know about.
Received on Fri Oct 28 2011 - 13:37:15 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:19 UTC