On Wed, 21 Jan 2004, John Baldwin wrote: > On Tuesday 20 January 2004 11:58 pm, Bruce Evans wrote: > > i386 (or equivalently, no special tuning) is the best default, at least > > in non-FPU-intensive applications. In my integer crunching application/ > > benchmark (searching a game tree), it even gives better results than > > -mcpu=pentiumpro on a pentiumpro class machine (a 366MHz Celeron). > > -mcpu=athlon-xp gives even better results. > > > > All with -O3 -fomit-frame-pointer > > -mcpu-athlon-xp 48.42 real 47.31 user 0.41 sys > > 51.22 real 50.10 user 0.30 sys > > -mcpu=i386 51.98 real 50.18 user 0.34 sys > > -mcpu=pentiumpro 56.38 real 55.26 user 0.34 sys > > -mcpu=pentium2 56.24 real 55.25 user 0.36 sys > > -mcpu=pentium3 56.59 real 55.25 user 0.40 sys > > -mcpu=pentium4 58.52 real 56.96 user 0.36 sys > > -mcpu=i486 79.17 real 77.69 user 0.32 sys > > -mcpu=i586 74.80 real 73.07 user 0.48 sys > > > > This is just one benchmark, chosen for its potential optimizability. > > I only did non-exhaustive benchmarks for the makeworld benchmark. I > > removed the -mpentiumpro change when I saw the kernel size bloat that > > it gave. > > Does -mcpu=althon-xp perform worse than the default in other benchmarks that > you've run? I haven't run enough to be sure. It's hard to test all the combinations for long enough. Some quick tests with the cc1 application/benchmark: cc1 compiled with -O3 -fomit-frame-pointer, and: -mcpu=i386 (code o3) -mcpu=i486 (o4) -mcpu=pentiumpro (op) -mcpu=athlon-xp (oa) Times for the "all" part of "make obj; make depend; make all" starting with an empty object tree and source tree = src/bin on the Celeron and src/usr.sbin on the Athlon (it doesn't complete because it wants to link to never-installed unbuilt libraries, but it gets a fair way). Smallest real time for 2 runs: On a Celeron 400 with source tree src/bin: o3: 121.94 real 97.14 user 19.94 sys o4: 130.83 real 106.59 user 19.07 sys oa: 122.69 real 97.58 user 19.39 sys op: 124.01 real 99.54 user 19.56 sys All non-null -mcpu settings are pessimizations, with -mcpu=i486 significantly bad and -mcpu=pentiumpro probably significantly bad. Optimizing the pentiumpro class machine as an athlon-xp works better (less worse here) than optimizing it as a pentiumpro in this benchmark too, but the differences are smaller On an Athlon-XP1600 overclocked with source tree src/usr.sbin: o3: 67.62 real 57.46 user 9.53 sys o4: 69.09 real 57.65 user 10.20 sys oa: 67.53 real 56.78 user 9.62 sys op: 68.14 real 57.47 user 9.70 sys Most of the differences are too small to be significant. Optimizing the athlon-xp as an athlon-xp at least doesn't pessimize it. My integer-crunching benchmark shows similarly small differences on freefall, but that may be just because freefall's gcc is so old. > > > > Note that CPUTYPE has worse bugs for i386's. Setting it to a supported > > > > CPU gives -march instead of -mcpu, so using it gives unportable > > > > binaries, and bsd.cpu.mk provides no way to get the corresponding -mcpu > > > > settings. OTOH, CPUTYPE for alphas gives only -mcpu. > > > > > > That is by design. Note that on all non-i386 architectures such as > > > alpha, etc. -mcpu means the same thing as -march. The other > > > architectures use -mtune to get the same effect as -mcpu on i386. > > > > Doesn't make it any less of a bug. > > The intent of CPUTYPE is that you can have ports and world optimized for the > specific machine you are compiling on, it is not set to anything by default, > so the user only gets -march=foo if they explicitly ask for it. I fail to > see how that is a bug. It is a bug because it implements the least useful option set first. BruceReceived on Fri Jan 23 2004 - 03:34:47 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:39 UTC