On Sat Dec 24 11, Bruce Evans wrote: > On Fri, 23 Dec 2011, Alexander Best wrote: > > >is -mpreferred-stack-boundary=2 really necessary for i386 builds any > >longer? > >i built GENERIC (including modules) with and without that flag. the results > >are: > > The same as it has always been. It avoids some bloat. > > >1654496 bytes with the flag set > >vs. > >1654952 bytes with the flag unset > > I don't believe this. GENERIC is enormously bloated, so it has size > more like 16MB than 1.6MB. Even a savings of 4K instead of 456 bytes i'm sorry. i used du(1) to get those numbers, so i believe those numbers represent the ammount of 512-byte blocks. if i'm correct GENERIC is even more bloated than you feared and almost reaches 1GB: 807,859375 megabytes with flag set vs. 808,0820313 megabytes without the flag set > is hard to believe. I get a savings of 9K (text) in a 5MB kernel. > Changing the default target arch from i386 to pentium-undocumented has > reduced the text space savings a little, since the default for passing > args is now to preallocate stack space for them and store to this, > instead of to push them; this preallocation results in more functions > needing to allocate some stack space explicitly, and when some is > allocated explicitly, the text space cost for this doesn't depend on > the size of the allocation. > > Anyway, the savings are mostly from from avoiding cache misses from > sparse allocation on stacks. > > Also, FreeBSD-i386 hasn't been programmed to support aligned stacks: > - KSTACK_PAGES on i386 is 2, while on amd64 it is 4. Using more > stack might push something over the edge > - not much care is taken to align the initial stack or to keep the > stack aligned in calls from asm code. E.g., any alignment for > mi_startup() (and thus proc0?) is accidental. This may result > in perfect alignment or perfect misalignment. Hopefully, more > care is taken with thread startup. For gcc, the alignment is > done bogusly in main() in userland, but there is no main() in > the kernel. The alignment doesn't matter much (provided the > perfect misalignment is still to a multiple of 4), but when it > matters, the random misalignment that results from not trying to > do it at all is better than perfect misalignment from getting it > wrong. With 4-byte alignment, the only cases that it helps are > with 64-bit variables. > > >the gcc(1) man page states the following: > > > >" > >This extra alignment does consume extra stack space, and generally > >increases code size. Code that is sensitive to stack space usage, > >such as embedded systems and operating system kernels, may want to > >reduce the preferred alignment to -mpreferred-stack-boundary=2. > >" > > > >the comment in sys/conf/kern.mk however sorta suggests that the default > >alignment of 4 bytes might improve performance. > > The default stack alignment is 16 bytes, which unimproves performance. > > clang handles stack alignment correctly (only does it when it is needed) > so it doesn't need a -mpreferred-stack-boundary option and doesn't > always break without alignment in main(). Well, at least it used to, > IIRC. Testing it now shows that it does the necessary andl of the > stack pointer for __aligned(32), but for __aligned(16) it now assumes > that the stack is aligned by the caller. So it now needs > -mpreferred-stack-boundary=2, but doesn't have it. OTOH, clang doesn't > do the andl in main() like gcc does (unless you put a dummy __aligned(32) > there), but requires crt to pass an aligned stack. > > BruceReceived on Sat Dec 24 2011 - 08:12:50 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:22 UTC