In the last episode (Sep 15), Stephen Montgomery-Smith said: > Stephen Montgomery-Smith wrote: > > Steve Kargl wrote: > >> On Mon, Sep 15, 2008 at 07:36:04PM -0500, Stephen Montgomery-Smith wrote: > >>> ... and each thread is a loop of the form > >>> > >>> while (1) { > >>> wait until told to start; > >>> do massive amounts of floating point arithmetic (only additions and > >>> multiplications) on large arrays; > >>> tell the master process that you are done; > >>> } > >>> > >>>> Do you have about as many threads as processor or more? > >>> Both ways. The time difference between the two approaches is > >>> negligible. > >>> > >> > >> Are you using ULE? With my MPI applications, if the number of > >> launched processes exceeds the number of cpus by 1, ULE falls > >> through the floor. I have a nagging feeling that there is a problem > >> with cpu affinity. > >> > >> http://lists.freebsd.org/pipermail/freebsd-current/2008-July/086917.html > > Let me say a little bit more. > > I have this gut feeling that the problem has a lot to do with cache > management. My program has each thread doing, in effect, huge matrix > multiplications, each one working on their own little bit. If a CPU > core changes from one thread to another, it then has to flush out the > cache to RAM, and read in a whole bunch of other RAM into cache. You can try playing with the new cpuset functions in HEAD and 7-STABLE to lock particular threads on certain CPUs. -- Dan Nelson dnelson_at_allantgroup.comReceived on Tue Sep 16 2008 - 02:32:30 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:35 UTC