RE: 4.7 vs 5.2.1 SMP/UP bridging performance

From: Andrew Gallatin <gallatin_at_cs.duke.edu>
Date: Wed, 5 May 2004 17:23:30 -0400 (EDT)
Bruce Evans writes:

 > 
 > Athlon XP2600 UP system:  !SMP case: 22 cycles   SMP case: 37 cycles
 > Celeron 366 SMP system:              35                    48
 > 
 > The extra cycles for the SMP case are just the extra cost of a one lock
 > instruction.  Note that SMP should cost twice as much extra, but the
 > non-SMP atomic_store_rel_int(&slock, 0) is pessimized by using xchgl
 > which always locks the bus.  After fixing this:
 > 
 > Athlon XP2600 UP system:  !SMP case:  6 cycles   SMP case: 37 cycles
 > Celeron 366 SMP system:              10                    48
 > 
 > Mutexes take longer than simple locks, but not much longer unless the
 > lock is contested.  In particular, they don't lock the bus any more
 > and the extra cycles for locking dominate (even in the !SMP case due
 > to the pessimization).
 > 
 > So there seems to be something wrong with your benchmark.  Locking the
 > bus for the SMP case always costs about 20+ cycles, but this hasn't
 > changed since RELENG_4 and mutexes can't be made much faster in the
 > uncontested case since their overhead is dominated by the bus lock
 > time.
 > 

Actually, I think his tests are accurate and bus locked instructions
take an eternity on P4.  See
http://www.uwsg.iu.edu/hypermail/linux/kernel/0109.3/0687.html 

For example, with your test above, I see 212 cycles for the UP case on
a 2.53GHz P4.  Replacing the atomic_store_rel_int(&slock, 0) with a
simple slock = 0; reduces that count to 18 cycles.

If its really safe to remove the xchg* from non-SMP atomic_store_rel*,
then I think you should do it.  Of course, that still leaves mutexes
as very expensive on SMP (253 cycles on the 2.53GHz from above).

Drew
Received on Wed May 05 2004 - 12:23:44 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:53 UTC