Re: dereferencing type-punned pointer will break strict-aliasing rules

From: Bruce Evans <bde_at_zeta.org.au>
Date: Tue, 29 Jul 2003 16:37:08 +1000 (EST)
On Mon, 28 Jul 2003, Thomas Moestl wrote:

> On Mon, 2003/07/28 at 09:30:08 +0900, Jun Kuriyama wrote:
> >
> > Is this caused by -oS option?
> >
> > ----- in making BOOTMFS in make release
> > cc -c -Os -pipe -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes  -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual  -fformat-extensions -std=c99  -nostdinc -I-  -I. -I/usr/src/sys -I/usr/src/sys/dev -I/usr/src/sys/contrib/dev/acpica -I/usr/src/sys/contrib/ipfilter -I/usr/src/sys/contrib/dev/ath -I/usr/src/sys/contrib/dev/ath/freebsd -D_KERNEL -include opt_global.h -fno-common -finline-limit=15000  -mno-align-long-strings -mpreferred-stack-boundary=2 -ffreestanding -Werror  /usr/src/sys/geom/geom_dev.c
> > /usr/src/sys/geom/geom_dev.c: In function `g_dev_open':
> > /usr/src/sys/geom/geom_dev.c:198: warning: dereferencing type-punned pointer will break strict-aliasing rules
> > [...]
>
> Yes, by implying -fstrict-aliasing, so using -fno-strict-aliasing is a
> workaround.

gcc.info claims that -fstrict-aliasing is implied by -Wall, but this is
apparently wrong -- adding -fstrict-aliasing to our standard warning flags
gives the same warnings.  -Os gives it since -Os is more like -O2 than -O,
and gcc.info's claim that -fstrict-aliasing is implied by -O2 is apparently
not wrong.  Related bug: our warning flags are missing other new gcc warnings.

> The problem is caused by the i386 PCPU_GET/PCPU_SET
> implementation:
>
> 	#define	__PCPU_GET(name) ({						\
> 		__pcpu_type(name) __result;					\
> 										\
> 	[...]
> 		} else if (sizeof(__result) == 4) {				\
> 			u_int __i;						\
> 			__asm __volatile("movl %%fs:%1,%0"			\
> 			    : "=r" (__i)					\
> 			    : "m" (*(u_int *)(__pcpu_offset(name))));		\
> 			__result = *(__pcpu_type(name) *)&__i;			\
> 	[...]
>
> In this case, the PCPU_GET is used to retrieve curthread, causing
> sizeof(__result) to be 4, so the cast at the end of the code snippet
> is from a u_int * to struct thread *, and __i is accessed through the
> casted pointer, which violates the C99 aliasing rules.
> An alternative is to type-pun via a union, which is also a bit ugly,
> but explicitly allowed by C99. Patch attached (but only superficially
> tested).

Uh, this macro is hideous enough without making it larger.  I found that
using __builtin_memcpy() avoids the warning at little or no cost in
object code size (the memcpy gets optimized away in almost the same way
as the assignment).  Then I tried to improve the macro.  I didn't quite
succeed -- the smller versions gave slightly larger object code for
5-20 out of 300 object files that depend on pcpu.h.

Annotated version:

% Index: pcpu.h
% ===================================================================
% RCS file: /home/ncvs/src/sys/i386/include/pcpu.h,v
% retrieving revision 1.35
% diff -u -2 -r1.35 pcpu.h
% --- pcpu.h	1 Oct 2002 14:01:58 -0000	1.35
% +++ pcpu.h	28 Jul 2003 14:02:30 -0000
% _at__at_ -78,5 +80,5 _at__at_
%   * Evaluates to the address of the per-cpu variable name.
%   */
% -#define	__PCPU_PTR(name) ({						\
% +#define	__PCPU_PTR(name) __extension__ ({				\

Unrelated change.  This prevents warnings when compiled with certain
strict warning flags (-pedantic?).

%  	__pcpu_type(name) *__p;						\
%  									\
% _at__at_ -96,21 +98,21 _at__at_
%  									\
%  	if (sizeof(__result) == 1) {					\
% -		u_char __b;						\
% -		__asm __volatile("movb %%fs:%1,%0"			\
% -		    : "=r" (__b)					\
% -		    : "m" (*(u_char *)(__pcpu_offset(name))));		\
% -		__result = *(__pcpu_type(name) *)&__b;			\
% +		__pcpu_type(name) __res;				\

The temporary variable with a basic type is to avoid gcc getting confused
doing reloads in dead code.  E.g., if __pcpu_type(name) is "struct timeval",
then the type can't be loaded into a byte register, and gcc would die
trying since it apparently tries to load things in dead code, especially
in asms.  However, this problem doesn't seem to affect __PCPU_GET().
There is no problem loading the source operand since only its address
can really be loaded, and no problem loading the target operand since the
asm does it.  There might be a problem unloading the target operand, but
there apparently isn't one for dead code.  This should be tested with -O0.

So we don't need the temporary variable hack and can use the natural
type for __PCPU_GET().  This goes most of the way towards making the
code the same for all type sizes 1/2/4.  I still use a temporary variable
and (now literally) duplicated code since it somehow gives slightly
better object code.

% +		__asm __volatile("mov%L0 %%fs:%1,%0"			\

"%L0" is to avoid spelling out the operand size.  This makes this line
literally the same for all cases.

% +		    : "=r" (__res)					\

The temporary variable is now spelled `__res' for all cases.

% +		    : "m" (*(__pcpu_type(name) *)(__pcpu_offset(name)))); \

The type is now spelled `__pcpu_type(name)' for all cases.

% +		__result = *(__pcpu_type(name) *)&__res;		\

`__result' and `__res' now have the same type (they could even be the
same variable), so this could be a simple assignment now.  However, type
aliasing as above or using __builtin_memcpy() somehow gives better code.
And (IIRC) at least the above gives binary identical code.

%  	} else if (sizeof(__result) == 2) {				\
% -		u_short __w;						\
% -		__asm __volatile("movw %%fs:%1,%0"			\
% -		    : "=r" (__w)					\
% -		    : "m" (*(u_short *)(__pcpu_offset(name))));		\
% -		__result = *(__pcpu_type(name) *)&__w;			\
% +		__pcpu_type(name) __res;				\
% +		__asm __volatile("mov%L0 %%fs:%1,%0"			\
% +		    : "=r" (__res)					\
% +		    : "m" (*(__pcpu_type(name) *)(__pcpu_offset(name)))); \
% +		__result = *(__pcpu_type(name) *)&__res;		\
%  	} else if (sizeof(__result) == 4) {				\
% -		u_int __i;						\
% -		__asm __volatile("movl %%fs:%1,%0"			\
% -		    : "=r" (__i)					\
% -		    : "m" (*(u_int *)(__pcpu_offset(name))));		\
% -		__result = *(__pcpu_type(name) *)&__i;			\
% +		__pcpu_type(name) __res;				\
% +		__asm __volatile("mov%L0 %%fs:%1,%0"			\
% +		    : "=r" (__res)					\
% +		    : "m" (*(__pcpu_type(name) *)(__pcpu_offset(name)))); \
% +		__result = *(__pcpu_type(name) *)&__res;		\

Similar changes for sizes 2 and 4.  Collapsing the now-identical code for
the size 1/2/4 cases somehow gives worse object code.

%  	} else {							\
%  		__result = *__PCPU_PTR(name);				\
% _at__at_ -123,29 +125,29 _at__at_
%   * Sets the value of the per-cpu variable name to value val.
%   */
% -#define	__PCPU_SET(name, val) ({					\
% +#define	__PCPU_SET(name, val) {						\

Warnings could be prevented by declaring the expression-statment as an
__extension, but since __PCPU_SET() doesn't return anything, using an
expression statement for it is just bogus; don't use one.

%  	__pcpu_type(name) __val = (val);				\
%  									\
%  	if (sizeof(__val) == 1) {					\
%  		u_char __b;						\

Basic types are still needed for __PCPU_SET().

% -		__b = *(u_char *)&__val;				\
% -		__asm __volatile("movb %1,%%fs:%0"			\
% -		    : "=m" (*(u_char *)(__pcpu_offset(name)))		\
% +		__builtin_memcpy(&__b, &__val, sizeof(__val));		\

This avoids the aliasing problem.

% +		__asm __volatile("mov%L1 %1,%%fs:%0"			\

"%L1" is to avoid spelling out the operand size.  This makes this line
literally the same for all cases.

% +		    : "=m" (*(__pcpu_type(name) *)(__pcpu_offset(name))) \

A basic type is not needed for this load, so we can use the natural type.
This makes this line literally the same for all cases.

%  		    : "r" (__b));					\
%  	} else if (sizeof(__val) == 2) {				\
%  		u_short __w;						\
% -		__w = *(u_short *)&__val;				\
% -		__asm __volatile("movw %1,%%fs:%0"			\
% -		    : "=m" (*(u_short *)(__pcpu_offset(name)))		\
% +		__builtin_memcpy(&__w, &__val, sizeof(__val));		\
% +		__asm __volatile("mov%L1 %1,%%fs:%0"			\
% +		    : "=m" (*(__pcpu_type(name) *)(__pcpu_offset(name))) \
%  		    : "r" (__w));					\
%  	} else if (sizeof(__val) == 4) {				\
%  		u_int __i;						\
% -		__i = *(u_int *)&__val;					\
% -		__asm __volatile("movl %1,%%fs:%0"			\
% -		    : "=m" (*(u_int *)(__pcpu_offset(name)))		\
% +		__builtin_memcpy(&__i, &__val, sizeof(__val));		\
% +		__asm __volatile("mov%L1 %1,%%fs:%0"			\
% +		    : "=m" (*(__pcpu_type(name) *)(__pcpu_offset(name))) \
%  		    : "r" (__i));					\

Similarly.  The code is now only mostly the same for the size 1/2/4 cases.

%  	} else {							\
%  		*__PCPU_PTR(name) = __val;				\

__PCPU_SET() is not used very much, so maybe it should be optimized for
code simplicity by always going through the pointer.

BTW, how does locking for accesses through the pointer work?  I think it
can only work due to accidental CPU affinity if Giant is not held.

%  	}								\
% -})
% +}

Part of not using an expression-statement for __PCPU_SET().

%
%  #define	PCPU_GET(member)	__PCPU_GET(pc_ ## member)

Unannotated version:

%%%
Index: pcpu.h
===================================================================
RCS file: /home/ncvs/src/sys/i386/include/pcpu.h,v
retrieving revision 1.35
diff -u -2 -r1.35 pcpu.h
--- pcpu.h	1 Oct 2002 14:01:58 -0000	1.35
+++ pcpu.h	28 Jul 2003 14:02:30 -0000
_at__at_ -78,5 +80,5 _at__at_
  * Evaluates to the address of the per-cpu variable name.
  */
-#define	__PCPU_PTR(name) ({						\
+#define	__PCPU_PTR(name) __extension__ ({				\
 	__pcpu_type(name) *__p;						\
 									\
_at__at_ -96,21 +98,21 _at__at_
 									\
 	if (sizeof(__result) == 1) {					\
-		u_char __b;						\
-		__asm __volatile("movb %%fs:%1,%0"			\
-		    : "=r" (__b)					\
-		    : "m" (*(u_char *)(__pcpu_offset(name))));		\
-		__result = *(__pcpu_type(name) *)&__b;			\
+		__pcpu_type(name) __res;				\
+		__asm __volatile("mov%L0 %%fs:%1,%0"			\
+		    : "=r" (__res)					\
+		    : "m" (*(__pcpu_type(name) *)(__pcpu_offset(name)))); \
+		__result = *(__pcpu_type(name) *)&__res;		\
 	} else if (sizeof(__result) == 2) {				\
-		u_short __w;						\
-		__asm __volatile("movw %%fs:%1,%0"			\
-		    : "=r" (__w)					\
-		    : "m" (*(u_short *)(__pcpu_offset(name))));		\
-		__result = *(__pcpu_type(name) *)&__w;			\
+		__pcpu_type(name) __res;				\
+		__asm __volatile("mov%L0 %%fs:%1,%0"			\
+		    : "=r" (__res)					\
+		    : "m" (*(__pcpu_type(name) *)(__pcpu_offset(name)))); \
+		__result = *(__pcpu_type(name) *)&__res;		\
 	} else if (sizeof(__result) == 4) {				\
-		u_int __i;						\
-		__asm __volatile("movl %%fs:%1,%0"			\
-		    : "=r" (__i)					\
-		    : "m" (*(u_int *)(__pcpu_offset(name))));		\
-		__result = *(__pcpu_type(name) *)&__i;			\
+		__pcpu_type(name) __res;				\
+		__asm __volatile("mov%L0 %%fs:%1,%0"			\
+		    : "=r" (__res)					\
+		    : "m" (*(__pcpu_type(name) *)(__pcpu_offset(name)))); \
+		__result = *(__pcpu_type(name) *)&__res;		\
 	} else {							\
 		__result = *__PCPU_PTR(name);				\
_at__at_ -123,29 +125,29 _at__at_
  * Sets the value of the per-cpu variable name to value val.
  */
-#define	__PCPU_SET(name, val) ({					\
+#define	__PCPU_SET(name, val) {						\
 	__pcpu_type(name) __val = (val);				\
 									\
 	if (sizeof(__val) == 1) {					\
 		u_char __b;						\
-		__b = *(u_char *)&__val;				\
-		__asm __volatile("movb %1,%%fs:%0"			\
-		    : "=m" (*(u_char *)(__pcpu_offset(name)))		\
+		__builtin_memcpy(&__b, &__val, sizeof(__val));		\
+		__asm __volatile("mov%L1 %1,%%fs:%0"			\
+		    : "=m" (*(__pcpu_type(name) *)(__pcpu_offset(name))) \
 		    : "r" (__b));					\
 	} else if (sizeof(__val) == 2) {				\
 		u_short __w;						\
-		__w = *(u_short *)&__val;				\
-		__asm __volatile("movw %1,%%fs:%0"			\
-		    : "=m" (*(u_short *)(__pcpu_offset(name)))		\
+		__builtin_memcpy(&__w, &__val, sizeof(__val));		\
+		__asm __volatile("mov%L1 %1,%%fs:%0"			\
+		    : "=m" (*(__pcpu_type(name) *)(__pcpu_offset(name))) \
 		    : "r" (__w));					\
 	} else if (sizeof(__val) == 4) {				\
 		u_int __i;						\
-		__i = *(u_int *)&__val;					\
-		__asm __volatile("movl %1,%%fs:%0"			\
-		    : "=m" (*(u_int *)(__pcpu_offset(name)))		\
+		__builtin_memcpy(&__i, &__val, sizeof(__val));		\
+		__asm __volatile("mov%L1 %1,%%fs:%0"			\
+		    : "=m" (*(__pcpu_type(name) *)(__pcpu_offset(name))) \
 		    : "r" (__i));					\
 	} else {							\
 		*__PCPU_PTR(name) = __val;				\
 	}								\
-})
+}

 #define	PCPU_GET(member)	__PCPU_GET(pc_ ## member)
%%%

Bruce
Received on Mon Jul 28 2003 - 21:37:31 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:16 UTC