Re: CURRENT: CLANG 3.3 and -stad=c++11 and -stdlib=libc++: isnan()/isninf() oddity

From: Bruce Evans <brde_at_optusnet.com.au> Date: Fri, 12 Jul 2013 01:28:26 +1000 (EST) · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:39 UTC

On Thu, 11 Jul 2013, David Chisnall wrote:

> On 11 Jul 2013, at 13:11, Bruce Evans <brde_at_optusnet.com.au> wrote:
>
>> <math.h> is also not required to be conforming C code, let alone C++ code,
>> so there is only a practical requirement that it works when included
>> in the C++ implementation.
>
> Working with the C++ implementation is the problem that we are trying to solve.
>
>> The compatibility that I'm talking about is with old versions of FreeBSD.
>> isnan() is still in libc as a function since that was part of the FreeBSD
>> ABI and too many things depended on getting it from there.  It was recently
>> ...
>
> I don't see a problem with changing the name of the function in the header and leaving the old symbol in libm for legacy code.

I don't even see why old code needs the symbol.  Old code should link to
old compat libraries that still have it.

>
>>> It would also be nice to implement these macros using _Generic when compiling in C11 mode, as it will allow the compiler to produce more helpful warning messages.  I would propose this implementation:
>>
>>> #if __has_builtin(__builtin_isnan)
>>
>> This won't work for me, since I develop and test msun with old compilers
>> that don't support __has_builtin().  Much the same set of compilers also
>> don't have enough FP builtins.
>
> Please look in cdefs.h, which defines __has_builtin(x) to 0 if we the compiler does not support it.  It is therefore safe to use __has_builtin() in any FreeBSD header.

The old compilers run on old systems that don't have that in cdefs.h
(though I sometimes edit it to add compatibility cruft like that).  msun
sources are otherwise portable to these systems.  Well, not quite.  They
are not fully modular and also depend on stuff in libc/include and
libc/${ARCH}.  I have to update or edit headers there.

This hack also doesn't work with gcc in -current.  gcc has __builtin_isnan
but not __has_builtin(), so __has_builtin(__builtin_isnan) gives the wrong
result 0.

>> It also doesn't even work.  clang has squillions of builtins that
>> aren't really builtines so they reduce to libcalls.
>
> Which, again, is not a problem for code outside of libm.  If libm needs different definitions of these macros then that's fine, but they should be private to libm, not installed as public headers.

Yes it is.  It means that nothing should use isnan() or FP_FAST_FMA* outside
of libm either, since isnan() is too slow and FP_FAST_FMA* can't be trusted.
Even the implementation can't reliably tell if __builtin_isnan is usuable
or better than alternatives.

>> The msun implementation knows that isnan() and other classification
>> macros are too slow to actually use, and rarely uses them.
>
> Which makes any concerns that only apply to msun internals irrelevant from the perspective of discussing what goes into this header.

No, the efficiency of isnan() is more important for externals, because the
internals already have work-arounds.

>>> #define isnan(x) __builtin_isnan(x)
>>> #else
>>> static __inline int
>>> __isnanf(float __x)
>>> {
>>>       return (__x != __x);
>>> }
>>
>> Here we can do better in most cases by hard-coding this without the ifdef.
>
> They will generate the same code.  Clang expands the builtin in the LLVM IR to a fcmp uno, so will generate the correct code even when doing fast math optimisations.

On some arches the same, and not affected by -ffast-math.  But this
is not necessarily the fastest code, so it is a performance bug if clang
akways generates the same code for the builtin.  Bit tests are faster in
some cases, and may be required to prevent exceptions for signaling NaNs.
-ffast-math could reasonably optimize x != x to "false".  It already assumes
that things like overflow and NaN results can't happen, so why not optimize
further by assuming that NaN inputs can't happen?

>> Generic stuff doesn't seem to work right for either isnan() or
>> __builtin_isnan(), though it could for at least the latter.  According
>> to a quick grep of strings $(which clang), __builtin_classify() is
>> generic but __builtin_isnan*() isn't (the former has no type suffixes
>> but the latter does, and testing shows that the latter doesn't work
>> without the suffices).
>
> I'm not sure what you were testing:

Mostly isnan() without including <math.h>, and gcc.  I was confused by
gcc converting floats to doubles.

> $ cat isnan2.c
>
> int test(float f, double d, long double l)
> {
>        return __builtin_isnan(f) |
>                __builtin_isnan(d) |
>                __builtin_isnan(l);
> }
> $ clang isnan2.c -S -emit-llvm -o - -O1
> ...
>  %cmp = fcmp uno float %f, 0.000000e+00
>  %cmp1 = fcmp uno double %d, 0.000000e+00
>  %or4 = or i1 %cmp, %cmp1
>  %cmp2 = fcmp uno x86_fp80 %l, 0xK00000000000000000000
> ...
>
> As you can see, it parses them as generics and generates different IR for each.  I don't believe that there's a way that these would be translated back into libcalls in the back end.

Yes, most cases work right.  gcc converts f to double and compares the
result, but that mostly works.  It would be just a pessimization except
te conversion gives an exception for signaling NaNs.  I hope I remember
correctly that the comparision doesn't give this exception.  When the
above is changed to use __builtin_isnanf() on the float, gcc does the
right thing (no conversion).  The almost-correct handling of the float
is apparently not completely accidental, since for the long double gcc
doesn't convert to double before comparing.

With gcc, you also don't have to use __builtin_isnan*() to get the
builtins.  gcc doesn't warn about the undeclared isnan(), and generates
the same code as the above.

You can also use isnanf() on the float to avoid the conversion.
isnanf() is nonstandard, so gcc warns about it.  If isnan() is declared
as a function as in <math.h>, then gcc forgets that it is standard and
generates a conversion for the long double case too.  This almost works,
like the conversion for the float case, since changing the precision
doesn't affect the classification

With clang, you only get the builtins if you use them explicitly.
clang generates verbose warnings for undeclared isnan() and always
calls the named function.  Conversions occur according to the prototype,
if any.

The above was tested only on amd64.

isinf() is even more interesting and broken.  Now gcc doesn't convert
automatically to builtins.  It does the reverse, in broken ways when
__builtin_isinf() is used on floats and long doubles:
- it converts the float to double to create the arg.  Mostly works, as above.
- it doesn't convert the long double to double, but copies it to the stack
   pessimally through integer registers.  The pessimization can be fixed
   by compiling with -march=core2.
isinf() is called twice with a double arg and once with a long double arg.
This shows that gcc doesn't really understand its own builtin, but is just
passing the values with the default promotions that occur when there is
no prototype in scope.

When __builtin_isinff() is used on the float arg and __builtin_isinfl()
is used on the long double arg, gcc passes the args correctly except
for pessimizing the long double.  It converts __builtin_isinff() to
isinff(), etc.  This is very broken, since isinff() is in the application
namespace.  Apart from that, it causes the following problems:
- isinff(), isinf() and isinfl() can't easily be removed from libraries
- in general, there is no way to know the name of a libcall functions that
   might be generated for builtins that don't really exist.  gcc also
   converts __builtin_fma() to fma().  That works and is not a namespace
   error since fma() is a standard API that must exist as a function.

clang does different things wrong for __builtin_isinf().  It gets the
type stuff right, as for __builtin_isnan().  It never generates
libcalls, so it doesn't need a differently named set of extern
functions.  But it has the major pessimization of calling extern
functions for fabs(), fabsf() and fabsl().  The latter is weird, since
clang optimizes explicit fabs*() calls almost perfectly on amd64 using
SSE bit operations for floats and doubles and i387 hardware fabs for
long doubles (bit operations also work for long doubles, but are less
convenient and much slower when done wrong).

I probably knew some of the previous paragraph when I wrote a local
isinf() for catrig*.c.  catrig*.c is unlike most of msun.  It uses
classification macros a lot.  Efficiency investigations showed that
these work OK provided they always use efficient builtins and in
particular, never use the standard macros (except isnan() = itself ->
builtin for gcc).  isinf(x) is just (fabs(x) == INFINITY), like clang
generates for __builtin_isinf(), but much better since the fabs() is
explicit so clang doesn't neglect to optimize it.  This works well for
gcc too (3.x and 4.2.1).

Bruce