On Sat Dec 24 11, Bruce Evans wrote: > On Fri, 23 Dec 2011, Alexander Best wrote: > > >is -mpreferred-stack-boundary=2 really necessary for i386 builds any > >longer? > >i built GENERIC (including modules) with and without that flag. the results > >are: > > The same as it has always been. It avoids some bloat. > > >1654496 bytes with the flag set > >vs. > >1654952 bytes with the flag unset > > I don't believe this. GENERIC is enormously bloated, so it has size > more like 16MB than 1.6MB. Even a savings of 4K instead of 456 bytes > is hard to believe. I get a savings of 9K (text) in a 5MB kernel. > Changing the default target arch from i386 to pentium-undocumented has > reduced the text space savings a little, since the default for passing > args is now to preallocate stack space for them and store to this, > instead of to push them; this preallocation results in more functions > needing to allocate some stack space explicitly, and when some is > allocated explicitly, the text space cost for this doesn't depend on > the size of the allocation. > > Anyway, the savings are mostly from from avoiding cache misses from > sparse allocation on stacks. > > Also, FreeBSD-i386 hasn't been programmed to support aligned stacks: > - KSTACK_PAGES on i386 is 2, while on amd64 it is 4. Using more > stack might push something over the edge > - not much care is taken to align the initial stack or to keep the > stack aligned in calls from asm code. E.g., any alignment for > mi_startup() (and thus proc0?) is accidental. This may result > in perfect alignment or perfect misalignment. Hopefully, more > care is taken with thread startup. For gcc, the alignment is > done bogusly in main() in userland, but there is no main() in > the kernel. The alignment doesn't matter much (provided the > perfect misalignment is still to a multiple of 4), but when it > matters, the random misalignment that results from not trying to > do it at all is better than perfect misalignment from getting it > wrong. With 4-byte alignment, the only cases that it helps are > with 64-bit variables. > > >the gcc(1) man page states the following: > > > >" > >This extra alignment does consume extra stack space, and generally > >increases code size. Code that is sensitive to stack space usage, > >such as embedded systems and operating system kernels, may want to > >reduce the preferred alignment to -mpreferred-stack-boundary=2. > >" > > > >the comment in sys/conf/kern.mk however sorta suggests that the default > >alignment of 4 bytes might improve performance. > > The default stack alignment is 16 bytes, which unimproves performance. maybe the part of the comment in sys/conf/kern.mk, which mentions that a stack alignment of 16 bytes might improve micro benchmark results should be removed. this would prevent people (like me) from thinking, using a stack alignment of 4 bytes is a compromise between size and efficiently. it isn't! currently a stack alignment of 16 bytes has no advantages towards one with 4 bytes on i386. so specifying -mpreferred-stack-boundary=2 on i386 is absolutely mandatory. please see the attached patch, which also introduduces a line break in order to describe the stack alignment issue in a paragraph of its own. cheers. alex > > clang handles stack alignment correctly (only does it when it is needed) > so it doesn't need a -mpreferred-stack-boundary option and doesn't > always break without alignment in main(). Well, at least it used to, > IIRC. Testing it now shows that it does the necessary andl of the > stack pointer for __aligned(32), but for __aligned(16) it now assumes > that the stack is aligned by the caller. So it now needs > -mpreferred-stack-boundary=2, but doesn't have it. OTOH, clang doesn't > do the andl in main() like gcc does (unless you put a dummy __aligned(32) > there), but requires crt to pass an aligned stack. > > Bruce
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:22 UTC