Andrew Gallatin wrote: > Bruce Evans writes: > > > > > Athlon XP2600 UP system: !SMP case: 22 cycles SMP case: > 37 cycles > > Celeron 366 SMP system: 35 48 > > > > The extra cycles for the SMP case are just the extra cost > of a one lock > > instruction. Note that SMP should cost twice as much > extra, but the > > non-SMP atomic_store_rel_int(&slock, 0) is pessimized by > using xchgl > > which always locks the bus. After fixing this: > > > > Athlon XP2600 UP system: !SMP case: 6 cycles SMP case: > 37 cycles > > Celeron 366 SMP system: 10 48 > > > > Mutexes take longer than simple locks, but not much longer > unless the > > lock is contested. In particular, they don't lock the bus any more > > and the extra cycles for locking dominate (even in the > !SMP case due > > to the pessimization). > > > > So there seems to be something wrong with your benchmark. > Locking the > > bus for the SMP case always costs about 20+ cycles, but this hasn't > > changed since RELENG_4 and mutexes can't be made much faster in the > > uncontested case since their overhead is dominated by the bus lock > > time. > > > > Actually, I think his tests are accurate and bus locked instructions > take an eternity on P4. See > http://www.uwsg.iu.edu/hypermail/linux/kernel/0109.3/0687.html > > For example, with your test above, I see 212 cycles for the UP case on > a 2.53GHz P4. Replacing the atomic_store_rel_int(&slock, 0) with a > simple slock = 0; reduces that count to 18 cycles. > > If its really safe to remove the xchg* from non-SMP atomic_store_rel*, > then I think you should do it. Of course, that still leaves mutexes > as very expensive on SMP (253 cycles on the 2.53GHz from above). > > Drew > I wonder if there is anything that can be done to make the locking more efficient for the Xeon. Are there any other locking types that could be used instead? This might also explain why we are seeing much worse system call performance under 4.7 in SMP versus UP. Here is a table of results for some system call tests I ran. (The numbers are calls/s) 2.8Ghz Xeon UP SMP write 904427 661312 socket 1327692 1067743 select 554131 434390 gettimeofday 1734963 252479 1.3Ghz PIII UP SMP write 746705 532223 socket 1179819 977448 select 727811 556537 gettimeofday 1849862 186387 The really interesting one is gettimeofday. For both the Xeon & PIII, the UP is much better than SMP, but the UP for PIII is better than that of the Xeon. I may try to get the results for 5.2.1 later. I can forward the source code of this program to anyone else who wants to try it out. Thanks, GerritReceived on Wed May 05 2004 - 16:16:39 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:53 UTC