on 17/05/2011 14:56 John Baldwin said the following: > On 5/17/11 4:03 AM, Andriy Gapon wrote: >> Couldn't [Shouldn't] the whole: >> >>>>> /* Ensure we have up-to-date values. */ >>>>> atomic_add_acq_int(&smp_rv_waiters[0], 1); >>>>> while (smp_rv_waiters[0]< smp_rv_ncpus) >>>>> cpu_spinwait(); >> >> be just replaced with: >> >> rmb(); >> >> Or a proper MI function that does just a read memory barrier, if rmb() is not that. > > No, you could replace it with: > > atomic_add_acq_int(&smp_rv_waiters[0], 1); What about (void)atomic_load_acq(&smp_rv_waiters[0]); In my opinion that should ensure that the hardware must post the latest value from a master CPU to memory of smp_rv_waiters[0] and a slave CPU gets it from there. And also, because of memory barriers inserted by store_rel on the master CPU and load_acq on the slave CPU, the latest values of all other smp_rv_* fields should become visible to the slave CPU. > The key being that atomic_add_acq_int() will block (either in hardware or > software) until it can safely perform the atomic operation. That means waiting > until the write to set smp_rv_waiters[0] to 0 by the rendezvous initiator is > visible to the current CPU. > > On some platforms a write by one CPU may not post instantly to other CPUs (e.g. it > may sit in a store buffer). That is fine so long as an attempt to update that > value atomically (using cas or a conditional-store, etc.) fails. For those > platforms, the atomic(9) API is required to spin until it succeeds. > > This is why the mtx code spins if it can't set MTX_CONTESTED for example. > Thank you for the great explanation! Taking sparc64 as an example, I think that atomic_load_acq uses a degenerate cas call, which should take care of hardware synchronization. -- Andriy GaponReceived on Tue May 17 2011 - 11:46:38 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:14 UTC