On Sat, 20 Jan 2007, David Malone wrote: > On Thu, Jan 18, 2007 at 11:16:19AM +1100, Bruce Evans wrote: >> - the FPU routines are faster on Athlons (XP and 64 at least), but these >> didn't exist until 2001. The introduction of these CPUs may have >> been the trigger for turning off the FPU routines in -current in 2001. >> Until then problems were limited to Pentium-1's since the dynamic >> configuration prevented the routines being used on all other machines. > > I think a very quirky K6-2 machine that I had let us reproduce the > problem fairly dependably and may have been part of the reason it > was finally turned off. I just looked again at your old (2001) mail about this. The userland benchmark was flawed. It tried 3 methods sequentially without warming up caches, so all methods did unintended testing of I-cache misses (including branch target cache cache) and the first method (userland bzero) warmed up the D-cache for the other 2. The kernel runtime configuration also fails to either warm or cool the caches initially. It assumes P1 cache sizes and depends on a 1MB buffer being much larger than caches. Maybe this was not enough for K6-2. It is certainly not enough for Athlon64, but I think it would mostly cause false negatives so I don't understand why it gave a false positive for the K6-2. After fixing the userland benchmark, userland bzero did much better and your benchmark agreed with mine that FPU methods for bzero are just pessimizations on A64-AXP. However, the behaviour for bcopy is quite different on A64-AXP -- even the old FPU methods are small optimizations in some cases (on A64, about 25% in the fully-L2 cached case; little difference for other large copies). BruceReceived on Sun Jan 21 2007 - 03:03:52 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:05 UTC