From: Bruce Evans [mailto:bde_at_zeta.org.au] > On Wed, 5 May 2004, Andrew Gallatin wrote: > ... > > > > Actually, I think his tests are accurate and bus locked instructions > > take an eternity on P4. See > > http://www.uwsg.iu.edu/hypermail/linux/kernel/0109.3/0687.html > > > > For example, with your test above, I see 212 cycles for the > UP case on > > a 2.53GHz P4. Replacing the atomic_store_rel_int(&slock, 0) with a > > simple slock = 0; reduces that count to 18 cycles. > > This seems to be right, unfortunately. I wonder if this has > anything to > do with freebsd.org having no P4 machines. > > > If its really safe to remove the xchg* from non-SMP > atomic_store_rel*, > > then I think you should do it. Of course, that still leaves mutexes > > as very expensive on SMP (253 cycles on the 2.53GHz from above). > > I forgot (again) that there are memory access ordering issues. A lock > may be needed to get everything synced. See the comment > before the i386 > versions in i386/include/atomic.h. A single lock may be enough. The > best example I could think of easily is: On the P4, there are mfence,lfence,sfence instructions to enforce memory ordering. These are cheaper than "lock; andl" or "cpuid", which are the traditional 'sync' instructions.Received on Thu May 06 2004 - 04:52:30 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:53 UTC