On Sun, Jul 31, 2016 at 07:03:08AM -0700, Adrian Chadd wrote: > Hi, > > Did you test on any 1, 2, 4, 8 cpu machines? just to see if there are > any performance degredations on lower count CPUs? > I did not test on machines which physically that few cpus, but I did test the impact on microbenchmark with 2 and 4 threads on the 80-way machine. There was no difference. For this iteration of the patch, given limited time I tried to be very conservative as to not intoduce additional latency. In fact I would argue the patch is undertuned (as in, it can do better in certain workloads). That said, I think it is safe to use. > Also, yeah, the MOD operator in each loop could get spendy on older > CPUs (eg my MIPS CPUs, older ARM stuff, etc.) Is it possible to > achieve much the same autotuning with pow2 operations instead of > divide/mod? > The % operation acts a randomizer. It is optional and I'm happy to ifdef it based on the architecture. It does seem to be useful at least on amd64. As a side note, exponential backoff is not used to keep things smaller (see above). It is definitely subject to change later. -- Mateusz Guzik <mjguzik gmail.com>Received on Sun Jul 31 2016 - 18:36:18 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:07 UTC