Hi, Did you test on any 1, 2, 4, 8 cpu machines? just to see if there are any performance degredations on lower count CPUs? Also, yeah, the MOD operator in each loop could get spendy on older CPUs (eg my MIPS CPUs, older ARM stuff, etc.) Is it possible to achieve much the same autotuning with pow2 operations instead of divide/mod? -aReceived on Sun Jul 31 2016 - 12:03:10 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:07 UTC