Re: Compiler performance tests on FreeBSD 10.0-CURRENT

From: Steve Kargl <sgk_at_troutmask.apl.washington.edu> Date: Wed, 5 Sep 2012 15:13:11 -0700 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:30 UTC

On Wed, Sep 05, 2012 at 11:31:26AM +0200, Dimitry Andric wrote:
> On 2012-09-05 01:40, Garrett Cooper wrote:
> ...
> >     Steve does have a point. Posting the results of
> >CFLAGS/CPPFLAGS/LDFLAGS/etc for config.log (and maybe poking through
> >the code to figure out what *FLAGS were used elsewhere) is more
> >valuable than the data is in its current state (unfortunately..
> >autoconf makes things more complicated).
> 
> 1) For building the FreeBSD in-tree version of clang 3.2:
> 
>      -O2 -pipe -fno-strict-aliasing
> 
> 2) For building the FreeBSD in-tree version of gcc 4.2.1:
> 
>      -O2 -pipe
> 
> 3) For building Boost 1.50.0:
> 
>      -ftemplate-depth-128 -O3 -finline-functions
> 

Dimitry thanks for the follow-up.  I performed an unscientific
(micro)benchmark of /usr/bin/cc vs /usr/bin/clang where cc is
the base system's gcc 4.2.1.  Here's what I found/feared.

Compiling libm on 

CPU: AMD Opteron(tm) Processor 248 (2192.01-MHz K8-class CPU)
  Origin = "AuthenticAMD"  Id = 0xf5a  Family = f  Model = 5  Stepping = 10
  Features=0x78bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,\
                     MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2>
  AMD Features=0xe0500800<SYSCALL,NX,MMX+,LM,3DNow!+,3DNow!>

with default CFLAGS (ie., -O2 -pipe) and -march=opteron.

Using 'setenv CC /usr/bin/cc' with 3 runs of

make clean
time make -DNO_MAN

yields

       69.39 real        52.00 user        38.55 sys
       69.57 real        52.35 user        38.37 sys
       69.48 real        52.25 user        38.38 sys

Now, repeating with 'setenv CC /usr/bin/clang' yields

       39.65 real        21.86 user        17.37 sys
       40.91 real        21.48 user        17.91 sys
       39.77 real        21.65 user        17.64 sys

So, clang does appear to be faster in this particular 
compiling speed benchmark.

However, if I know build my test program for libm's j0f()
function where the only difference is whether libm was
built with /usr/bin/cc or /usr/bin/clang, I observe the
following results. 

1234567 x values in the interval [0:25]    

                         gcc libm    |   clang libm
                     ----------------|-----------------
      ULP <= 0.6 --> 565515 (45.81%) | 513763 (41.61%)
0.6 < ULP <= 0.7 --> 74148  ( 6.01%) | 67221  ( 5.44%)
0.7 < ULP <= 0.8 --> 69112  ( 5.60%) | 62846  ( 5.09%)
0.8 < ULP <= 0.9 --> 63798  ( 5.17%) | 58217  ( 4.72%)
0.9 < ULP <= 1.0 --> 58679  ( 4.75%) | 53834  ( 4.36%)
1.0 < ULP <= 2.0 --> 328221 (26.59%) | 306728 (24.84%)
2.0 < ULP <= 3.0 --> 65323  ( 5.29%) | 63452  ( 5.14%)
3.0 < ULP        --> 9771   ( 0.79%) | 108506 ( 8.79%)

                    gcc libm         |     clang libm
              -----------------------|--------------------
     MAX ULP: 12152.27637            | 1129606938624.00000
x at MAX ULP: 5.520077 0x1.6148f2p+2 | 2.404833 0x1.33d19p+1

Speed test with gcc libm.
1234567 j0f calls in 0.193427 seconds.
1234567 j0f calls in 0.193410 seconds.
1234567 j0f calls in 0.194158 seconds.

Speed test with clang libm.
1234567 j0f calls in 0.180260 seconds.
1234567 j0f calls in 0.180130 seconds.
1234567 j0f calls in 0.179739 seconds.

So, although the clang built j0f() appears to be faster than
the gcc built j0f(), the clang built j0f() has much worse
accuracy issues.

-- 
Steve