Re: [rfc] removing -mpreferred-stack-boundary=2 flag for i386?

From: Alexander Best <arundel_at_freebsd.org> Date: Sat, 24 Dec 2011 12:14:25 +0000 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:22 UTC

On Sat Dec 24 11, Bruce Evans wrote:
> On Sat, 24 Dec 2011, Alexander Best wrote:
> 
> >On Sat Dec 24 11, Bruce Evans wrote:
> >>On Fri, 23 Dec 2011, Alexander Best wrote:
> >...
> >>>the gcc(1) man page states the following:
> >>>
> >>>"
> >>>This extra alignment does consume extra stack space, and generally
> >>>increases code size.  Code that is sensitive to stack space usage,
> >>>such as embedded systems and operating system kernels, may want to
> >>>reduce the preferred alignment to -mpreferred-stack-boundary=2.
> >>>"
> >>>
> >>>the comment in sys/conf/kern.mk however sorta suggests that the default
> >>>alignment of 4 bytes might improve performance.
> >>
> >>The default stack alignment is 16 bytes, which unimproves performance.
> >
> >maybe the part of the comment in sys/conf/kern.mk, which mentions that a 
> >stack
> >alignment of 16 bytes might improve micro benchmark results should be 
> >removed.
> >this would prevent people (like me) from thinking, using a stack alignment 
> >of
> >4 bytes is a compromise between size and efficiently. it isn't! currently a
> >stack alignment of 16 bytes has no advantages towards one with 4 bytes on 
> >i386.
> 
> I think the comment is clear enough.  It it mentions all the tradeoffs.
> It is only slightly cryptic in saying that these are tradeoffs and that
> the configuration is our best guess at the best tradeoff -- it just says
> "while" for both.  It goes without saying that we don't use our worst
> guess.  Anyone wanting to change this should run benchmarks and beware
> that micro-benchmarks are especially useless.  The changed comment is not
> so good since it no longer mentions micro-bencharmarks or says "while".

if micro benchmark results aren't of any use, why should the claim that the
default stack alignment of 16 bytes might produce better outcome stay?

it doesn't seem as if anybody has micro benchmarked 16 bytes vs. 4 bytes stack
alignment, until now. so the micro benchmark statement in the comment seems to
be pure speculation. even worse...it indicates that by removing the
-mpreferred-stack-boundary=2 flag, one can gain a performance boost by
sacrifying a few more bytes of kernel (and module) size.

this suggests that the behavior -mpreferred-stack-boundary=2 vs. not specyfing
it, losely equals the semantics of -Os vs. -O2.

i don't see how a 4 byte stack alignment for the kernel has any tradeoffs
against the default 16 byte alignment. so if there are no tradeoffs, the
comment shouldn't imply that there are.

cheers.
alex

> 
> >so specifying -mpreferred-stack-boundary=2 on i386 is absolutely mandatory.
> 
> Not mandatory; just an optimization.
> 
> >
> >please see the attached patch, which also introduduces a line break in 
> >order to
> >describe the stack alignment issue in a paragraph of its own.
> 
> There should also be an empty line for a paragraph break.
> 
> % +# Explicitly prohibit the use of FPU, SSE and other SIMD operations 
> inside the
> % +# kernel itself.  These operations are exclusively reserved for user
> % +# applications.
> 
> This part was actually wronger:
> - these operations are not really reserved, but were just not supported
>   in the kernel
> - they have been supported in the kernel for some time, although anything
>   wanting to use the compiler to generate them would have to do something
>   to kill the options added here.  Kernel code using them must inform the
>   kernel that it is doing so, using fpu_kern*(9undoc), and this is
>   only valid in some contexts (more or less for kernel-only threads)
>   so we still prevent compilers from using them routinely.  The makefile
>   is not the right place to describe any of this,
> 
> Bruce