Re: [rfc] removing -mpreferred-stack-boundary=2 flag for i386?

From: Alexander Best <arundel_at_freebsd.org>
Date: Sat, 24 Dec 2011 09:12:50 +0000
On Sat Dec 24 11, Bruce Evans wrote:
> On Fri, 23 Dec 2011, Alexander Best wrote:
> 
> >is -mpreferred-stack-boundary=2 really necessary for i386 builds any 
> >longer?
> >i built GENERIC (including modules) with and without that flag. the results
> >are:
> 
> The same as it has always been.  It avoids some bloat.
> 
> >1654496	bytes with the flag set
> >vs.
> >1654952	bytes with the flag unset
> 
> I don't believe this.  GENERIC is enormously bloated, so it has size
> more like 16MB than 1.6MB.  Even a savings of 4K instead of 456 bytes

i'm sorry. i used du(1) to get those numbers, so i believe those numbers
represent the ammount of 512-byte blocks. if i'm correct GENERIC is even
more bloated than you feared and almost reaches 1GB:

807,859375  megabytes with flag set
vs.
808,0820313 megabytes without the flag set

> is hard to believe.  I get a savings of 9K (text) in a 5MB kernel.
> Changing the default target arch from i386 to pentium-undocumented has
> reduced the text space savings a little, since the default for passing
> args is now to preallocate stack space for them and store to this,
> instead of to push them; this preallocation results in more functions
> needing to allocate some stack space explicitly, and when some is
> allocated explicitly, the text space cost for this doesn't depend on
> the size of the allocation.
> 
> Anyway, the savings are mostly from from avoiding cache misses from
> sparse allocation on stacks.
> 
> Also, FreeBSD-i386 hasn't been programmed to support aligned stacks:
> - KSTACK_PAGES on i386 is 2, while on amd64 it is 4.  Using more
>   stack might push something over the edge
> - not much care is taken to align the initial stack or to keep the
>   stack aligned in calls from asm code.  E.g., any alignment for
>   mi_startup() (and thus proc0?) is accidental.  This may result
>   in perfect alignment or perfect misalignment.  Hopefully, more
>   care is taken with thread startup.  For gcc, the alignment is
>   done bogusly in main() in userland, but there is no main() in
>   the kernel.  The alignment doesn't matter much (provided the
>   perfect misalignment is still to a multiple of 4), but when it
>   matters, the random misalignment that results from not trying to
>   do it at all is better than perfect misalignment from getting it
>   wrong.  With 4-byte alignment, the only cases that it helps are
>   with 64-bit variables.
> 
> >the gcc(1) man page states the following:
> >
> >"
> >This extra alignment does consume extra stack space, and generally
> >increases code size.  Code that is sensitive to stack space usage,
> >such as embedded systems and operating system kernels, may want to
> >reduce the preferred alignment to -mpreferred-stack-boundary=2.
> >"
> >
> >the comment in sys/conf/kern.mk however sorta suggests that the default
> >alignment of 4 bytes might improve performance.
> 
> The default stack alignment is 16 bytes, which unimproves performance.
> 
> clang handles stack alignment correctly (only does it when it is needed)
> so it doesn't need a -mpreferred-stack-boundary option and doesn't
> always break without alignment in main().  Well, at least it used to,
> IIRC.  Testing it now shows that it does the necessary andl of the
> stack pointer for __aligned(32), but for __aligned(16) it now assumes
> that the stack is aligned by the caller.  So it now needs
> -mpreferred-stack-boundary=2, but doesn't have it.  OTOH, clang doesn't
> do the andl in main() like gcc does (unless you put a dummy __aligned(32)
> there), but requires crt to pass an aligned stack.
> 
> Bruce
Received on Sat Dec 24 2011 - 08:12:50 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:22 UTC