Re: Optimization bug with floating-point?

From: Steve Kargl <sgk_at_troutmask.apl.washington.edu> Date: Wed, 13 Mar 2019 11:08:06 -0700 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:20 UTC

On Wed, Mar 13, 2019 at 10:16:12AM -0700, John Baldwin wrote:
> On 3/13/19 9:40 AM, Steve Kargl wrote:
> > On Wed, Mar 13, 2019 at 09:32:57AM -0700, John Baldwin wrote:
> >> On 3/13/19 8:16 AM, Steve Kargl wrote:
> >>> On Tue, Mar 12, 2019 at 07:45:41PM -0700, Steve Kargl wrote:
> >>>>
> >>>> gcc8 --version
> >>>> gcc8 (FreeBSD Ports Collection) 8.3.0
> >>>>
> >>>> gcc8 -fno-builtin -o z a.c -lm && ./z
> >>>> gcc8 -O -fno-builtin -o z a.c -lm && ./z
> >>>> gcc8 -O2 -fno-builtin -o z a.c -lm && ./z
> >>>> gcc8 -O3 -fno-builtin -o z a.c -lm && ./z
> >>>>
> >>>> Max ULP: 2.297073
> >>>> Count: 0           (# of ULP that exceed 21)
> >>>>
> >>>
> >>> clang agrees with gcc8 if one changes ...
> >>>
> >>>> int
> >>>> main(void)
> >>>> {
> >>>>    double re, im, u, ur, ui;
> >>>>    float complex f;
> >>>>    float x, y;
> >>>
> >>> this line to "volatile float x, y".
> >>
> >> So it seems to be a regression in clang 7 vs clang 6?
> >>
> > 
> > /usr/local/bin/clang60 has the same problem.  
> > 
> > % /usr/local/bin/clang60 -o z -O2 a.c -lm && ./z
> >   Maximum ULP: 23.061242
> > # of ULP > 21: 39
> > 
> > Adding volatile as in the above "fixes" the problem.
> > 
> > AFAICT, this a i386/387 code generation problem.  Perhaps,
> > an alignment issue?
> 
> Oh, I misread your earlier e-mail to say that clang60 worked.
> 
> One issue I'm aware of is that clang does not have any support for the
> special arrangement FreeBSD/i386 uses where it uses different precision
> for registers vs in-memory for some of the floating point types (GCC has
> a special hack that is only used on FreeBSD for this but isn't used on
> any other OS's).  I wonder if that could be a factor?  Volatile probably
> forces a round trip between memory which might explain why this is the
> case.
> 
> I wonder what your test program does on i386 Linux with GCC?

I don't have an i386 Linux environment.  I tried comparing the
assembly generated with and without volatile, but it proves
difficult as register numbers are changed between the 2 listings
so almost all lines mismatch

If I move ranged(), rangef(), dp_csinh(), and ulpfd() into b.c
so a.c only contains main(), add appropriate prototypes to a.c,
and comment out the printf() statements, I still see the problem.
Judging from the diff, there is a difference in the spills and
loads in 2 places.

% diff -uw without_volatile with_volatile
--- without_volatile	2019-03-13 10:51:33.244226000 -0700
+++ with_volatile	2019-03-13 10:51:54.088095000 -0700
_at__at_ -35,11 +35,13 _at__at_
 	movl	%esi, 68(%esp)          # 4-byte Spill
 	calll	rangef
 	fadds	.LCPI0_0
-	fstpl	24(%esp)                # 8-byte Folded Spill
+	fstps	28(%esp)
 	calll	rangef
 	fadds	.LCPI0_1
-	fstl	100(%esp)               # 8-byte Folded Spill
-	fldl	24(%esp)                # 8-byte Folded Reload
+	fstps	24(%esp)
+	flds	28(%esp)
+	flds	24(%esp)
+	fxch	%st(1)
 	fstps	48(%esp)
 	fstps	52(%esp)
 	movl	48(%esp), %eax
_at__at_ -49,13 +51,13 _at__at_
 	calll	csinhf
 	movl	%eax, %esi
 	movl	%edx, %edi
+	flds	28(%esp)
+	flds	24(%esp)
 	leal	72(%esp), %eax
 	movl	%eax, 20(%esp)
 	leal	80(%esp), %eax
 	movl	%eax, 16(%esp)
-	fldl	100(%esp)               # 8-byte Folded Reload
 	fstpl	8(%esp)
-	fldl	24(%esp)                # 8-byte Folded Reload
 	fstpl	(%esp)
 	calll	dp_csinh
 	movl	%esi, 40(%esp)
_at__at_ -75,7 +77,7 _at__at_
 	fnstsw	%ax
                                         # kill: def $ah killed $ah killed $ax
 	sahf
-	fstl	24(%esp)                # 8-byte Folded Spill
+	fstl	100(%esp)               # 8-byte Folded Spill
 	ja	.LBB0_3
 # %bb.2:                                # %for.body
                                         #   in Loop: Header=BB0_1 Depth=1
_at__at_ -114,7 +116,7 _at__at_
                                         #   in Loop: Header=BB0_1 Depth=1
 	fstp	%st(2)
 	fldl	92(%esp)                # 8-byte Folded Reload
-	fldl	24(%esp)                # 8-byte Folded Reload
+	fldl	100(%esp)               # 8-byte Folded Reload
 	fucomp	%st(1)
 	fnstsw	%ax
                                         # kill: def $ah killed $ah killed $ax

Adding ieeefp.h to a.c and fpsetprec(FP_PE) in main()
produces a massive diff, but still wrong results if
volatile is not use.

Clang appears to be broken for FP on i386/387.

-- 
Steve