Bruce Evans writes: > > Athlon XP2600 UP system: !SMP case: 22 cycles SMP case: 37 cycles > Celeron 366 SMP system: 35 48 > > The extra cycles for the SMP case are just the extra cost of a one lock > instruction. Note that SMP should cost twice as much extra, but the > non-SMP atomic_store_rel_int(&slock, 0) is pessimized by using xchgl > which always locks the bus. After fixing this: > > Athlon XP2600 UP system: !SMP case: 6 cycles SMP case: 37 cycles > Celeron 366 SMP system: 10 48 > > Mutexes take longer than simple locks, but not much longer unless the > lock is contested. In particular, they don't lock the bus any more > and the extra cycles for locking dominate (even in the !SMP case due > to the pessimization). > > So there seems to be something wrong with your benchmark. Locking the > bus for the SMP case always costs about 20+ cycles, but this hasn't > changed since RELENG_4 and mutexes can't be made much faster in the > uncontested case since their overhead is dominated by the bus lock > time. > Actually, I think his tests are accurate and bus locked instructions take an eternity on P4. See http://www.uwsg.iu.edu/hypermail/linux/kernel/0109.3/0687.html For example, with your test above, I see 212 cycles for the UP case on a 2.53GHz P4. Replacing the atomic_store_rel_int(&slock, 0) with a simple slock = 0; reduces that count to 18 cycles. If its really safe to remove the xchg* from non-SMP atomic_store_rel*, then I think you should do it. Of course, that still leaves mutexes as very expensive on SMP (253 cycles on the 2.53GHz from above). DrewReceived on Wed May 05 2004 - 12:23:44 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:53 UTC