RE: 4.7 vs 5.2.1 SMP/UP bridging performance

From: Don Bowman <don_at_sandvine.com> Date: Thu, 6 May 2004 09:52:27 -0400 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:53 UTC

From: Bruce Evans [mailto:bde_at_zeta.org.au]
> On Wed, 5 May 2004, Andrew Gallatin wrote:
> 
 ...

> >
> > Actually, I think his tests are accurate and bus locked instructions
> > take an eternity on P4.  See
> > http://www.uwsg.iu.edu/hypermail/linux/kernel/0109.3/0687.html
> >
> > For example, with your test above, I see 212 cycles for the 
> UP case on
> > a 2.53GHz P4.  Replacing the atomic_store_rel_int(&slock, 0) with a
> > simple slock = 0; reduces that count to 18 cycles.
> 
> This seems to be right, unfortunately.  I wonder if this has 
> anything to
> do with freebsd.org having no P4 machines.
> 
> > If its really safe to remove the xchg* from non-SMP 
> atomic_store_rel*,
> > then I think you should do it.  Of course, that still leaves mutexes
> > as very expensive on SMP (253 cycles on the 2.53GHz from above).
> 
> I forgot (again) that there are memory access ordering issues.  A lock
> may be needed to get everything synced.  See the comment 
> before the i386
> versions in i386/include/atomic.h.  A single lock may be enough.  The
> best example I could think of easily is:

On the P4, there are mfence,lfence,sfence instructions to enforce
memory ordering. These are cheaper than "lock; andl" or "cpuid",
which are the traditional 'sync' instructions.