Re: Use of C99 extra long double math functions after r236148

From: Steve Kargl <sgk_at_troutmask.apl.washington.edu>
Date: Sun, 8 Jul 2012 22:02:38 -0700
On Sun, Jul 08, 2012 at 10:41:44PM -0500, Stephen Montgomery-Smith wrote:
> On 07/08/2012 09:01 PM, Steve Kargl wrote:
> 
> >Have you read Goldberg's paper?
> 
> I must admit that I had not.  I found it at:
> 
> http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html

Yes, it's easy to find.  Unfortunately, many people don't
know that it exists.  I've read that paper between 10 and
20 times and I still learn something new with every reading.

> >Not to mention, I've seen way too many examples of 'x - y'
> >where cancellation of significant digits causes
> >problems.  Throw in rather poor estimates of function
> >results with real poor ULP and you have problems.
> 
> I think that if a program includes "x-y" where cancellation causes huge 
> loss of digits, then the program should be considered highly flawed. 
> The programmer might get lucky, and a few extra ULP might save his skin, 
> but it would be better if the program catastrophically failed so that he 
> would realize they need to do something smarter.
> 
> I got my first scientific calculator around 1975, and I remember my 
> surprise when I discovered that (1/3)*3-1 on my calculator did not 
> produce zero.  Years later I bought another calculator, and imagine my 
> further surprise when (1/3)*3-1 did produce zero.  They must have done 
> something very smart, I thought.

Yes.  They used CORDIC arithmetic instead of (binary) floating point.

> The problem is, these kinds of tricks can only help you so much. 
> Calculations like 
> arccos(arccos(arccos(arccos(cos(cos(cos(cos(x)))))))))-x are probably 
> not going to be correct no matter how smart the programmer is.

Well, I would claim that any programmer who wrote such an
expression is not smart.

> The paper by Goldberg has some really nice stuff.  I teach a class on 
> numerical methods.

It may be profitable to at least have your students read the
paper.  I don't know about others, but I find doing floating
point arithematic correctly very difficult.

> One example I present is the problem using equation 
> (4) of Goldberg's paper to solve quadratic equations.  I tell them that 
> the smart thing to do is to use equation (4) or equation (5) depending 
> on whether b has the same sign as sqrt(b^2-4ac).  This is a very good 
> illustration of how to overcome the "x-y" problem, and in my opinion it 
> has to be understood by the programmer, not the writer of the 
> square-root package.  Trying to compensate by getting that lost drop of 
> ULP out of square root is looking in the wrong direction.

Careful.  IEEE 754 specifies and one can prove that sqrt()
can be computed correctly to less than or equal to 1/2 ULP
for all input in all rounding modes.  Computing the roots
of the quartic equation, which uses the sqrt() function,
is a different matter.

> But there is other stuff in his paper I don't like, and it comes across 
> as nit-picking rather than really thinking through why he wants the 
> extra accuracy.  I feel like he is saving the pennies, but losing the 
> dollars.

well, in Goldberg's defense, think about the title of the paper.
I think his major aim in the paper is get scientist to think
about what and how they compute some number.  

> Similarly, I am not a fan of the Kahan Summation Formula (Theorem 8). 

Oh my.  I'm a fan of Kahan's summation formula.  But, I assume
that one applies it to situations where it is needed.

> There are two reasons why I might compute large sums.  One is 
> accounting, when I better be using high enough precision that the 
> arithmetic is exact.

If you're doing accouting, hopefully, you're using BCD.
Accouting and floating point simply do not mix.

>  The other is something like numerical integration. 
>  But in the latter case, unless the summands have a very special form, 
> it is likely that each summand will only be accurate to within e.  Thus 
> the true sum is only known to within n*e, and so the naive method really 
> hasn't lost any precision at all.  And typically n*e will be so small 
> compared to the true answer that it won't matter.  If it does matter, 
> the problem is that n is too big.  The algorithm will take decades to 
> run.  Focus efforts on producing a different numerical integration 
> method, instead of getting that lost drop of ULP.  I cannot envision any 
> non-contrived circumstance where the Kahan summation formula is going to 
> give an answer that is significantly more reliable than naive summation. 
> I can envision a circumstance when the reduction in speed of the Kahan 
> summation formula over naive summation could be significant.

I can envision a use.  Try computing the RMS height of a
numerically generated random rough surface in the scattering
of an incident acoustic field from said surface.  The RMS
height is a main component in the first order perturbation
theory expansion of the scattered field.  You get the
RMS height wrong, you have no idea what the scatter field
strength is.

> I think the example I like best that illustrates floating point errors 
> is inverting badly conditioned matrices.  And any scientist who works 
> with large matrices had better know what ill-conditioned means.  The 
> proper answer is that if your matrix is ill-conditioned, give up.  You 
> need to look at the problem where your matrix came from, and rethink how 
> you concert it into a matrix problem, so that the matrix is not 
> ill-conditioned.  Chasing that last drop of ULP simply will not help one 
> iota.

Yep.  Another example is the use of upward recurion to compute
Bessel functions where the argument is larger than the order.
The algorithm is known to be unstable.

-- 
Steve
Received on Mon Jul 09 2012 - 03:02:39 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:28 UTC