Re: svn commit: r302601 - in head/sys: arm/include arm64/include [clang 3.8.0: powerpc int instead of 32-bit SYSVR4's long and 64-bit ELF V2 long]

From: Mark Millard <markmi_at_dsl-only.net>
Date: Thu, 14 Jul 2016 02:53:29 -0700
[Top post of a history note for powerpc and wchar_t's type in FreeBSD. The history is from looking around in svn.]

[The below is not a complaint or a request for a change. It just looks like int for wchar_t for powerpc was a choice made long ago for simpler code given FreeBSD's pre-existing structure.]

int being used for powerpc wchar_t on FreeBSD goes back to at least 2001-Jan-1. [FYI: "27 February, 2008: FreeBSD 7.0 is the first release to officially support the FreeBSD/ppc port". So long before official support.]

wchar_t's type is one place where FreeBSD choose to override the powerpc (and powerpc64) ABI standards (that indicate long, not int). I'm not sure if this was implicit vs. explicitly realizing the ABI mismatch. [The SYSVR4 32-bit powerpc ABI goes back to 1995.]

I first traced the history back to 2002-Aug-23: -r102315 of sys/sys/_types.h standardized FreeBSD on the following until the ARM change:

typedef int             __ct_rune_t;
typedef __ct_rune_t     __rune_t;
typedef __ct_rune_t     __wchar_t;
typedef __ct_rune_t     __wint_t;

Prior to this there was 2002-Aug-21's -r102227 sys/powerpc/include/_types.h that used __int32_t.

Prior to that had ansi.h and types.h instead of _types.h --and ansi.h had:

#define _BSD_WCHAR_T_   _BSD_CT_RUNE_T_         /* wchar_t (see below) */
. . .
#define _BSD_CT_RUNE_T_ int                     /* arg type for ctype funcs */

Going back to sys/powerpc/include/ansi.h's -r70571 (2001-Jan-1 creation in svn):

#define _BSD_WCHAR_T_   int                     /* wchar_t */

And the comments back then say:

. . . It is not
 * unsigned so that EOF (-1) can be naturally assigned to it and used.
. . . The reason an int was
 * chosen over a long is that the is*() and to*() routines take ints (says
 * ANSI C), but they use __ct_rune_t instead of int.

I've decided to not go any farther back in time (if there is prior history for wchar_t for powerpc).

Ignoring the temporary __int32_t use: FreeBSD has had its own powerpc wchar_t type (int) for at least the last 15 years, at least when viewed just relative to the powerpc ABI(s) FreeBSD is based on for powerpc.



Modern gcc versions even have the FreeBSD wchar_t type correct for powerpc variants in recent times: int. Previously some notation (L based notation) used the wrong type for one of the powerpc variants (32-bit vs. 64-bit), causing lots of false-positive compiler notices. gcc had followed the ABI involved (long int) until the correction.

===
Mark Millard
markmi at dsl-only.net

On 2016-Jul-13, at 11:46 PM, Mark Millard <markmi at dsl-only.net> wrote:

> On 2016-Jul-13, at 6:00 PM, Andrey Chernov <ache at freebsd.org> wrote:
> 
>> On 13.07.2016 11:53, Mark Millard wrote:
>>> [The below does note that TARGET=powerpc has a mix of signed wchar_t and unsigned char types and most architectures have both being signed types.]
>> 
>> POSIX says nothing about wchar_t and char should be the same (un)signed.
>> It is arm ABI docs may say so only. They are different entities
>> differently encoded and cross assigning between wchar_t and char is not
>> recommended.
> 
> [My "odd" would better have been the longer phrase "unusual for FreeBSD" for the signed type mismatch point.]
> 
> C11 (9899:2011[2012]) and C++11 (14882:2011(E)) agree with your POSIX note: no constraint to have the same signed type status as char.
> 
> But when I then looked at the "System V Application Binary Interface PowerpC Processor Supplement" (1995-Sept SunSoft document) that I believe FreeBSD uses for powerpc (32-bit only: TARGET_ARCH=powerpc) it has:
> 
> typedef long wchar_t;
> 
> as part of: Figure 6-39 <stddef.h> (page labeled 6-38).
> 
> While agreeing about the signed-type status for wchar_t this does not agree with FreeBSD 11.0's use of int as the type:
> 
> sys/powerpc/include/_types.h:typedef	int		___wchar_t;
> sys/powerpc/include/_types.h:#define	__WCHAR_MIN	__INT_MIN	/* min value for a wchar_t */
> sys/powerpc/include/_types.h:#define	__WCHAR_MAX	__INT_MAX	/* max value for a wchar_t */
> 
> # clang --target=powerpc-freebsd11 -std=c99 -E -dM  - < /dev/null | more
> . . .
> #define __WCHAR_MAX__ 2147483647
> #define __WCHAR_TYPE__ int
> #define __WCHAR_WIDTH__ 32
> . . .
> 
> I'm not as sure of which document is official for TARGET_ARCH=powerpc64 but using "Power Architecture 64-bit ELF V2 ABI Specification" (Open POWER ABI for Linux Supplement) as an example of what likely is common for that context: 5.1.3 Types Defined in Standard header lists:
> 
> typedef long wchar_t;
> 
> which again does not agree with FreeBSD 11.0's use of int as the type:
> 
> # clang --target=powerpc64-freebsd11 -std=c99 -E -dM  - < /dev/null | more
> . . .
> #define __WCHAR_MAX__ 2147483647
> #define __WCHAR_TYPE__ int
> #define __WCHAR_WIDTH__ 32
> . . .
> 
> 
> ===
> Mark Millard
> markmi at dsl-only.net
> 
> 
>> 
>> On 2016-Jul-11, at 8:57 PM, Andrey Chernov <ache at freebsd.org> wrote:
>> 
>>> On 12.07.2016 5:44, Mark Millard wrote:
>>>> My understanding of the criteria for __WCHAR_MIN and __WCHAR_MAX:
>>>> 
>>>> A) __WCHAR_MIN and __WCHAR_MAX: same type as the integer promotion of
>>>> ___wchar_t (if that is distinct).
>>>> B) __WCHAR_MIN is the low value for ___wchar_t as an integer type; not
>>>> necessarily a valid char value
>>>> C) __WCHAR_MAX is the high value for ___wchar_t as an integer type; not
>>>> necessarily a valid char value
>>> 
>>> It seems you are right about "not a valid char value", I'll back this
>>> change out.
>>> 
>>>> As far as I know arm FreeBSD uses unsigned character types (of whatever
>>>> width).
>>> 
>>> Probably it should be unsigned for other architectures too, clang does
>>> not generate negative values with L'<char>' literals and locale use only
>>> positive values too.
>> 
>> Looking around:
>> 
>> # grep -i wchar sys/*/include/_types.h
>> sys/arm/include/_types.h:typedef	unsigned int	___wchar_t;
>> sys/arm/include/_types.h:#define	__WCHAR_MIN	0		/* min value for a wchar_t */
>> sys/arm/include/_types.h:#define	__WCHAR_MAX	__UINT_MAX	/* max value for a wchar_t */
>> sys/arm64/include/_types.h:typedef	unsigned int	___wchar_t;
>> sys/arm64/include/_types.h:#define	__WCHAR_MIN	0		/* min value for a wchar_t */
>> sys/arm64/include/_types.h:#define	__WCHAR_MAX	__UINT_MAX	/* max value for a wchar_t */
>> sys/mips/include/_types.h:typedef	int		___wchar_t;
>> sys/mips/include/_types.h:#define	__WCHAR_MIN	__INT_MIN	/* min value for a wchar_t */
>> sys/mips/include/_types.h:#define	__WCHAR_MAX	__INT_MAX	/* max value for a wchar_t */
>> sys/powerpc/include/_types.h:typedef	int		___wchar_t;
>> sys/powerpc/include/_types.h:#define	__WCHAR_MIN	__INT_MIN	/* min value for a wchar_t */
>> sys/powerpc/include/_types.h:#define	__WCHAR_MAX	__INT_MAX	/* max value for a wchar_t */
>> sys/riscv/include/_types.h:typedef	int		___wchar_t;
>> sys/riscv/include/_types.h:#define	__WCHAR_MIN	__INT_MIN	/* min value for a wchar_t */
>> sys/riscv/include/_types.h:#define	__WCHAR_MAX	__INT_MAX	/* max value for a wchar_t */
>> sys/sparc64/include/_types.h:typedef	int		___wchar_t;
>> sys/sparc64/include/_types.h:#define	__WCHAR_MIN	__INT_MIN	/* min value for a wchar_t */
>> sys/sparc64/include/_types.h:#define	__WCHAR_MAX	__INT_MAX	/* max value for a wchar_t */
>> sys/x86/include/_types.h:typedef	int		___wchar_t;
>> sys/x86/include/_types.h:#define	__WCHAR_MIN	__INT_MIN	/* min value for a wchar_t */
>> sys/x86/include/_types.h:#define	__WCHAR_MAX	__INT_MAX	/* max value for a wchar_t */
>> 
>> So only arm and arm64 have unsigned wchar_t types.
>> 
>> [NOTE: __CHAR16_TYPE__ and __CHAR32_TYPE__ are always unsigned: in C++11 terms char16_t is like std::uint_least16_t and char32_t is like std::uint_least32_t despite being distinct types. So __CHAR16_TYPE__ and __CHAR32_TYPE__ are ignored below.]
>> 
>> The clang 3.8.0 compiler output has an odd mix for TARGET_ARCH=powerpc and TARGET_ARCH=powerpc64 . . .
>> 
>> armv6 has unsigned types for both char and __WCHAR_TYPE__.
>> aarch64 has unsigned types for both char and __WCHAR_TYPE__.
>> powerpc has unsigned for char but signed for __WCHAR_TYPE__.
>> powerpc64 has unsigned for char but signed for __WCHAR_TYPE__.
>> amd64 has signed types for both char and __WCHAR_TYPE__.
>> i386 has signed types for both char and __WCHAR_TYPE__.
>> mips has signed types for both char and __WCHAR_TYPE__.
>> sparc64 has signed types for both char and __WCHAR_TYPE__.
>> (riscv is not covered by clang as I understand)
>> 
>> The details via compiler #define's. . .
>> 
>> # clang --target=armv6-freebsd11 -std=c99 -E -dM  - < /dev/null | more
>> . . .
>> #define __BYTE_ORDER__ __ORDER_LITTLE_ENDIAN__
>> . . .
>> #define __CHAR_BIT__ 8
>> #define __CHAR_UNSIGNED__ 1
>> . . .
>> #define __WCHAR_MAX__ 4294967295U
>> #define __WCHAR_TYPE__ unsigned int
>> #define __WCHAR_UNSIGNED__ 1
>> #define __WCHAR_WIDTH__ 32
>> . . .
>> 
>> # clang --target=aarch64-freebsd11 -std=c99 -E -dM  - < /dev/null | more
>> . . .
>> #define __BYTE_ORDER__ __ORDER_LITTLE_ENDIAN__
>> . . .
>> #define __CHAR_BIT__ 8
>> #define __CHAR_UNSIGNED__ 1
>> . . .
>> #define __WCHAR_MAX__ 4294967295U
>> #define __WCHAR_TYPE__ unsigned int
>> #define __WCHAR_UNSIGNED__ 1
>> #define __WCHAR_WIDTH__ 32
>> . . .
>> 
>> # clang --target=powerpc-freebsd11 -std=c99 -E -dM  - < /dev/null | more
>> . . .
>> #define __BYTE_ORDER__ __ORDER_BIG_ENDIAN__
>> . . .
>> #define __CHAR_BIT__ 8
>> #define __CHAR_UNSIGNED__ 1
>> . . .
>> #define __WCHAR_MAX__ 2147483647
>> #define __WCHAR_TYPE__ int
>> #define __WCHAR_WIDTH__ 32
>> . . . (note the lack of __WCHAR_UNSIGNED__) . . .
>> 
>> Is powerpc wrong?
>> 
>> # clang --target=powerpc64-freebsd11 -std=c99 -E -dM  - < /dev/null | more
>> . . .
>> #define __BYTE_ORDER__ __ORDER_BIG_ENDIAN__
>> . . .
>> #define __CHAR_BIT__ 8
>> #define __CHAR_UNSIGNED__ 1
>> . . .
>> #define __WCHAR_MAX__ 2147483647
>> #define __WCHAR_TYPE__ int
>> #define __WCHAR_WIDTH__ 32
>> . . . (note the lack of __WCHAR_UNSIGNED__) . . .
>> 
>> Is powerpc64 wrong?
>> 
>> 
>> # clang --target=amd64-freebsd11 -std=c99 -E -dM  - < /dev/null | more
>> . . .
>> #define __BYTE_ORDER__ __ORDER_LITTLE_ENDIAN__
>> . . .
>> #define __CHAR_BIT__ 8
>> . . . (note the lack of __CHAR_UNSIGNED__) . . .
>> 
>> #define __WCHAR_MAX__ 2147483647
>> #define __WCHAR_TYPE__ int
>> #define __WCHAR_WIDTH__ 32
>> . . . (note the lack of __WCHAR_UNSIGNED__) . . .
>> 
>> # clang --target=i386-freebsd11 -std=c99 -E -dM  - < /dev/null | more
>> . . .
>> #define __BYTE_ORDER__ __ORDER_LITTLE_ENDIAN__
>> . . .
>> #define __CHAR_BIT__ 8
>> . . . (note the lack of __CHAR_UNSIGNED__) . . .
>> 
>> #define __WCHAR_MAX__ 2147483647
>> #define __WCHAR_TYPE__ int
>> #define __WCHAR_WIDTH__ 32
>> . . . (note the lack of __WCHAR_UNSIGNED__) . . .
>> 
>> 
>> # clang --target=mips-freebsd11 -std=c99 -E -dM  - < /dev/null | more
>> . . .
>> #define __BYTE_ORDER__ __ORDER_BIG_ENDIAN__
>> . . .
>> #define __CHAR_BIT__ 8
>> . . . (note the lack of __CHAR_UNSIGNED__) . . .
>> 
>> #define __WCHAR_MAX__ 2147483647
>> #define __WCHAR_TYPE__ int
>> #define __WCHAR_WIDTH__ 32
>> . . . (note the lack of __WCHAR_UNSIGNED__) . . .
>> 
>> # clang --target=sparc64-freebsd11 -std=c99 -E -dM  - < /dev/null | more
>> . . .
>> #define __BYTE_ORDER__ __ORDER_BIG_ENDIAN__
>> . . .
>> #define __CHAR_BIT__ 8
>> . . . (note the lack of __CHAR_UNSIGNED__) . . .
>> 
>> #define __WCHAR_MAX__ 2147483647
>> #define __WCHAR_TYPE__ int
>> #define __WCHAR_WIDTH__ 32
>> . . . (note the lack of __WCHAR_UNSIGNED__) . . .
>> 
>> 
>> 
>> ===
>> Mark Millard
>> markmi at dsl-only.net
Received on Thu Jul 14 2016 - 07:53:40 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:06 UTC