>From John-Mark Gurney <gurney_j_at_resnet.uoregon.edu>, Sun, Oct 08, 2006 at 05:22:00PM -0700: > Wouldn't having a single run queue lock still serialize the cpu's when > getting a thread to run? Don't we really need a per cpu run queue, and > then have a scheduler that puts threads on the cpu's run queues? Just so. A fine-grain mutex model is much like pipelining a processor. You break a task up into many small tasks each with its own lock and then run the tasks in parallel. There is a problem though. 1) How many pipeline stages you want depends on the number of underlying CPUs. Too few stages means idle processors. Too many stages relative to the available resources means wasted time acquiring many locks. 2) The work has been relatively matched. One slow stage creates bottlenecks that everyone else eventually feels. 3) Cache coherency: handing off a datum between CPUs wastes cache resources. Allowing one CPU to process a given datum through a life-time of several different lock stages (tasks) means sharing the data covered by those task's locks. This sort of design problem is hard to solve and harder for several different developers some of them working in their spare-time to coordinate properly. It's even harder to do when the end application is unknown (i.e., in a general purpose OS). BUT, per-cpu algorithms are not necessarily better. you can waste cache resources by data replication. you need to provision load-balancing/migration. I had been hoping that mdillon _at_ dragonfly would provide good experimental evidence on this point. unfortunately, in my estimation he s distracted by some heavy goals (namely his single-image clustering). The consequence of this is 1) it will take a long-time before enough of the dragonfly kernel is out from under giant and 2) relative performance comparisons will be dubious because it will be unclear what cost he is paying for some of his single-image clustering support. Its a shame because the community would greatly benefit from clear experimental evidence contrasting some of the fine-grained locking approaches with a more per-cpu oriented design. AFAIK, the linux kernel has generally favored the per-cpu approach. In that respect, relative underperformance of FreeBSD vs. Linux is an indicator that per-cpu approaches deserve more weight in the FreeBSD world. PaulReceived on Sun Oct 08 2006 - 23:23:29 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:01 UTC