2011/10/28 <mdf_at_freebsd.org>: > On Fri, Oct 28, 2011 at 8:37 AM, Ryan Stone <rysto32_at_gmail.com> wrote: >> I'm seeing issues on a unicore systems running a derivative of FreeBSD >> 8.2-RELEASE if something calls mem_range_attr_set. It turns out that >> the root cause is a bug in smp_rendezvous_cpus. The first part of >> smp_rendezvous_cpus attempts to short-circuit the non-SMP case(note >> that smp_started is never set to 1 on a unicore system): >> >> if (!smp_started) { >> if (setup_func != NULL) >> setup_func(arg); >> if (action_func != NULL) >> action_func(arg); >> if (teardown_func != NULL) >> teardown_func(arg); >> return; >> } >> >> The problem is that this runs with interrupts enabled, outside of a >> critical section. My system runs with device_polling enabled with hz >> set to 2500, so its quite easy to wedge the system by having a thread >> run mem_range_attr_set. That has to do a smp_rendezvous, and if a >> timer interrupt happens to go off half-way through the action_func and >> preempt this thread, the system ends up deadlocked(although once it's >> wedged, typing at the serial console stands a good chance of unwedging >> the system. Go figure). I'm not entirely sure why this exactly breaks though (do you see that happening with a random rendezvous callback or it is always the same?), because that just becames a simple function calling on cpu0, even if I think that there is still a bug as smp_rendezvous callbacks may expect to have interrupts and preemption disabled and the short-circuit breaks that assumption. >> I know that smp_rendezvous was reworked substantially on HEAD, but by >> inspection it looks like the bug is still present, as the >> short-circuit behaviour is still there. >> >> I am not entirely sure of the best way to fix this. Is it as simple >> as doing a spinlock_enter before setup_func and a spinlock_exit after >> teardown_func? It seems to boot fine, but I'm not at all confident >> that I understand the nuances of smp_rendezvous to be sure that there >> aren't side effects that I don't know about. > > Looks like Max didn't have time to upstream this fix: > > struct mtx smp_ipi_mtx; > +MTX_SYSINIT(smp_ipi_mtx, &smp_ipi_mtx, "smp rendezvous", MTX_SPIN); > > ... > > static void > mp_start(void *dummy) > { > > - mtx_init(&smp_ipi_mtx, "smp rendezvous", NULL, MTX_SPIN); > > ... > > if (!smp_started) { > + mtx_lock_spin(&smp_ipi_mtx); > if (setup_func != NULL) > setup_func(arg); > if (action_func != NULL) > action_func(arg); > if (teardown_func != NULL) > teardown_func(arg); > + mtx_unlock_spin(&smp_ipi_mtx); > return; > } I don't think that we strictly need the lock here, my preferred solution would be (only test-compiled): Index: sys/kern/subr_smp.c =================================================================== --- sys/kern/subr_smp.c (revision 226972) +++ sys/kern/subr_smp.c (working copy) _at__at_ -415,13 +415,16 _at__at_ smp_rendezvous_cpus(cpuset_t map, { int curcpumap, i, ncpus = 0; + /* Look comments in the !SMP case. */ if (!smp_started) { + spinlock_enter(); if (setup_func != NULL) setup_func(arg); if (action_func != NULL) action_func(arg); if (teardown_func != NULL) teardown_func(arg); + spinlock_exit(); return; } _at__at_ -666,12 +669,18 _at__at_ smp_rendezvous_cpus(cpuset_t map, void (*teardown_func)(void *), void *arg) { + /* + * In the !SMP case we just need to ensure the same initial conditions + * as the SMP case. + */ + spinlock_enter(); if (setup_func != NULL) setup_func(arg); if (action_func != NULL) action_func(arg); if (teardown_func != NULL) teardown_func(arg); + spinlock_exit(); } void _at__at_ -681,12 +690,15 _at__at_ smp_rendezvous(void (*setup_func)(void *), void *arg) { + /* Look comments in the smp_rendezvous_cpus() case. */ + spinlock_enter(); if (setup_func != NULL) setup_func(arg); if (action_func != NULL) action_func(arg); if (teardown_func != NULL) teardown_func(arg); + spinlock_exit(); } /* Thanks, Attilio -- Peace can only be achieved by understanding - A. EinsteinReceived on Mon Oct 31 2011 - 22:43:05 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:19 UTC