Certain operational sequences fair really badly when cpu_idle_hlt is turned off, and its definitely due to contention. I've seen this quite a lot. I have some numbers below. Generally speaking I think its a good idea to wake up a HLTed cpu, but it has to be done intelligently. e.g. only wake it up if you have work that it can do, you don't send multiple IPIs if it hasn't processed the first one you sent (hey, wakeup! HEY WAKEUP! WAKEUP FASTER! <GRIN>), and you send the IPI asynchronously. In regards to where the contention is occuring, I think the scheduling queues are only part of the problem. BGL contention is going to be an issue as well but it will be especially bad due to the nesting count being integrated with the MP lock. I recommend putting the BGL nesting count in the thread structure and leaving the lock as a straight -1 or cpuid. Also, the initial disposition of a forked process could have a huge effect due to L1/L2 cache locality. Consider the cache cost of a fork which does an immediate exec where the fork is scheduled on a different cpu. Nasty! -Matt (5.0) DELL 2550 2xCPU P3 1.2GHz (I gotta update that machine's OS) machdep.cpu_idle_hlt=0 fork/exit/wait: 4.543s 10000 loops = 454.271uS/loop fork/exit/wait: 4.572s 10000 loops = 457.228uS/loop fork/exit/wait: 4.598s 10000 loops = 459.773uS/loop full duplex pipe / 1char: 3.786s 100000 loops = 37.859uS/loop full duplex pipe / 1char: 3.917s 100000 loops = 39.170uS/loop full duplex pipe / 1char: 4.075s 100000 loops = 40.747uS/loop machdep.cpu_idle_hlt=1 fork/exit/wait: 3.179s 10000 loops = 317.879uS/loop fork/exit/wait: 3.181s 10000 loops = 318.129uS/loop fork/exit/wait: 3.241s 10000 loops = 324.111uS/loop full duplex pipe / 1char: 2.235s 100000 loops = 22.348uS/loop full duplex pipe / 1char: 2.370s 100000 loops = 23.696uS/loop full duplex pipe / 1char: 2.489s 100000 loops = 24.894uS/loop (4.7) DELL 2550 2xCPU P3 1.2GHz machdep.cpu_idle_hlt=0 fork/exit/wait: 2.640s 10000 loops = 263.974uS/loop fork/exit/wait: 2.772s 10000 loops = 277.175uS/loop fork/exit/wait: 2.772s 10000 loops = 277.216uS/loop full duplex pipe / 1char: 3.541s 100000 loops = 35.412uS/loop full duplex pipe / 1char: 3.596s 100000 loops = 35.961uS/loop full duplex pipe / 1char: 3.451s 100000 loops = 34.511uS/loop machdep.cpu_idle_hlt=1 fork/exit/wait: 1.570s 10000 loops = 157.002uS/loop fork/exit/wait: 1.571s 10000 loops = 157.052uS/loop fork/exit/wait: 1.576s 10000 loops = 157.606uS/loop full duplex pipe / 1char: 1.522s 100000 loops = 15.215uS/loop full duplex pipe / 1char: 1.521s 100000 loops = 15.211uS/loop full duplex pipe / 1char: 1.522s 100000 loops = 15.221uS/loopReceived on Fri Jul 11 2003 - 00:44:52 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:14 UTC