RE: SMP and setrunnable()- scheduler 4bsd

From: Matthew Dillon <dillon_at_apollo.backplane.com>
Date: Fri, 11 Jul 2003 02:44:49 -0700 (PDT)
    Certain operational sequences fair really badly when cpu_idle_hlt
    is turned off, and its definitely due to contention.  I've seen this
    quite a lot.  I have some numbers below.

    Generally speaking I think its a good idea to wake up a HLTed cpu, but
    it has to be done intelligently.  e.g. only wake it up if you have work
    that it can do, you don't send multiple IPIs if it hasn't processed the
    first one you sent (hey, wakeup!  HEY WAKEUP!  WAKEUP FASTER! <GRIN>),
    and you send the IPI asynchronously. 

    In regards to where the contention is occuring, I think the scheduling
    queues are only part of the problem.  BGL contention is going to be an
    issue as well but it will be especially bad due to the nesting count
    being integrated with the MP lock.  I recommend putting the BGL nesting
    count in the thread structure and leaving the lock as a straight
    -1 or cpuid.  Also, the initial disposition of a forked process 
    could have a huge effect due to L1/L2 cache locality.  Consider the
    cache cost of a fork which does an immediate exec where the fork is
    scheduled on a different cpu.  Nasty!

						-Matt

(5.0)	DELL 2550 2xCPU P3 1.2GHz	(I gotta update that machine's OS)

machdep.cpu_idle_hlt=0
fork/exit/wait: 4.543s 10000 loops = 454.271uS/loop
fork/exit/wait: 4.572s 10000 loops = 457.228uS/loop
fork/exit/wait: 4.598s 10000 loops = 459.773uS/loop
full duplex pipe / 1char: 3.786s 100000 loops = 37.859uS/loop
full duplex pipe / 1char: 3.917s 100000 loops = 39.170uS/loop
full duplex pipe / 1char: 4.075s 100000 loops = 40.747uS/loop

machdep.cpu_idle_hlt=1
fork/exit/wait: 3.179s 10000 loops = 317.879uS/loop
fork/exit/wait: 3.181s 10000 loops = 318.129uS/loop
fork/exit/wait: 3.241s 10000 loops = 324.111uS/loop
full duplex pipe / 1char: 2.235s 100000 loops = 22.348uS/loop
full duplex pipe / 1char: 2.370s 100000 loops = 23.696uS/loop
full duplex pipe / 1char: 2.489s 100000 loops = 24.894uS/loop

(4.7)	DELL 2550 2xCPU P3 1.2GHz

machdep.cpu_idle_hlt=0
fork/exit/wait: 2.640s 10000 loops = 263.974uS/loop
fork/exit/wait: 2.772s 10000 loops = 277.175uS/loop
fork/exit/wait: 2.772s 10000 loops = 277.216uS/loop
full duplex pipe / 1char: 3.541s 100000 loops = 35.412uS/loop
full duplex pipe / 1char: 3.596s 100000 loops = 35.961uS/loop
full duplex pipe / 1char: 3.451s 100000 loops = 34.511uS/loop

machdep.cpu_idle_hlt=1
fork/exit/wait: 1.570s 10000 loops = 157.002uS/loop
fork/exit/wait: 1.571s 10000 loops = 157.052uS/loop
fork/exit/wait: 1.576s 10000 loops = 157.606uS/loop
full duplex pipe / 1char: 1.522s 100000 loops = 15.215uS/loop
full duplex pipe / 1char: 1.521s 100000 loops = 15.211uS/loop
full duplex pipe / 1char: 1.522s 100000 loops = 15.221uS/loop
Received on Fri Jul 11 2003 - 00:44:52 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:14 UTC