Re: smp_rendezvous runs with interrupts and preemption enabled on unicore systems

From: Attilio Rao <attilio_at_freebsd.org>
Date: Tue, 1 Nov 2011 00:43:03 +0100
2011/10/28  <mdf_at_freebsd.org>:
> On Fri, Oct 28, 2011 at 8:37 AM, Ryan Stone <rysto32_at_gmail.com> wrote:
>> I'm seeing issues on a unicore systems running a derivative of FreeBSD
>> 8.2-RELEASE if something calls mem_range_attr_set.  It turns out that
>> the root cause is a bug in smp_rendezvous_cpus.  The first part of
>> smp_rendezvous_cpus attempts to short-circuit the non-SMP case(note
>> that smp_started is never set to 1 on a unicore system):
>>
>>        if (!smp_started) {
>>                if (setup_func != NULL)
>>                        setup_func(arg);
>>                if (action_func != NULL)
>>                        action_func(arg);
>>                if (teardown_func != NULL)
>>                        teardown_func(arg);
>>                return;
>>        }
>>
>> The problem is that this runs with interrupts enabled, outside of a
>> critical section.  My system runs with device_polling enabled with hz
>> set to 2500, so its quite easy to wedge the system by having a thread
>> run mem_range_attr_set.  That has to do a smp_rendezvous, and if a
>> timer interrupt happens to go off half-way through the action_func and
>> preempt this thread, the system ends up deadlocked(although once it's
>> wedged, typing at the serial console stands a good chance of unwedging
>> the system.  Go figure).

I'm not entirely sure why this exactly breaks though (do you see that
happening with a random rendezvous callback or it is always the
same?), because that just becames a simple function calling on cpu0,
even if I think that there is still a bug as smp_rendezvous callbacks
may expect to have interrupts and preemption disabled and the
short-circuit breaks that assumption.

>> I know that smp_rendezvous was reworked substantially on HEAD, but by
>> inspection it looks like the bug is still present, as the
>> short-circuit behaviour is still there.
>>
>> I am not entirely sure of the best way to fix this.  Is it as simple
>> as doing a spinlock_enter before setup_func and a spinlock_exit after
>> teardown_func?  It seems to boot fine, but I'm not at all confident
>> that I understand the nuances of smp_rendezvous to be sure that there
>> aren't side effects that I don't know about.
>
> Looks like Max didn't have time to upstream this fix:
>
>  struct mtx smp_ipi_mtx;
> +MTX_SYSINIT(smp_ipi_mtx, &smp_ipi_mtx, "smp rendezvous", MTX_SPIN);
>
> ...
>
>  static void
>  mp_start(void *dummy)
>  {
>
> -       mtx_init(&smp_ipi_mtx, "smp rendezvous", NULL, MTX_SPIN);
>
> ...
>
>        if (!smp_started) {
> +               mtx_lock_spin(&smp_ipi_mtx);
>                if (setup_func != NULL)
>                        setup_func(arg);
>                if (action_func != NULL)
>                        action_func(arg);
>                if (teardown_func != NULL)
>                        teardown_func(arg);
> +               mtx_unlock_spin(&smp_ipi_mtx);
>                return;
>        }

I don't think that we strictly need the lock here, my preferred
solution would be (only test-compiled):
Index: sys/kern/subr_smp.c
===================================================================
--- sys/kern/subr_smp.c (revision 226972)
+++ sys/kern/subr_smp.c (working copy)
_at__at_ -415,13 +415,16 _at__at_ smp_rendezvous_cpus(cpuset_t map,
 {
        int curcpumap, i, ncpus = 0;

+       /* Look comments in the !SMP case. */
        if (!smp_started) {
+               spinlock_enter();
                if (setup_func != NULL)
                        setup_func(arg);
                if (action_func != NULL)
                        action_func(arg);
                if (teardown_func != NULL)
                        teardown_func(arg);
+               spinlock_exit();
                return;
        }

_at__at_ -666,12 +669,18 _at__at_ smp_rendezvous_cpus(cpuset_t map,
        void (*teardown_func)(void *),
        void *arg)
 {
+       /*
+        * In the !SMP case we just need to ensure the same initial conditions
+        * as the SMP case.
+        */
+       spinlock_enter();
        if (setup_func != NULL)
                setup_func(arg);
        if (action_func != NULL)
                action_func(arg);
        if (teardown_func != NULL)
                teardown_func(arg);
+       spinlock_exit();
 }

 void
_at__at_ -681,12 +690,15 _at__at_ smp_rendezvous(void (*setup_func)(void *),
               void *arg)
 {

+       /* Look comments in the smp_rendezvous_cpus() case. */
+       spinlock_enter();
        if (setup_func != NULL)
                setup_func(arg);
        if (action_func != NULL)
                action_func(arg);
        if (teardown_func != NULL)
                teardown_func(arg);
+       spinlock_exit();
 }

 /*


Thanks,
Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
Received on Mon Oct 31 2011 - 22:43:05 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:19 UTC