Re: There is *NO* abi stability in -head

From: blubee blubeeme <gurenchan_at_gmail.com>
Date: Tue, 24 Oct 2017 05:19:24 +0800
Thanks for these, I came across them when writing some game engine code a
few years back.
I really enjoy this stuff because I find it down right obnoxious that code
gets slower as CPU power increases!

On Tue, Oct 24, 2017 at 4:35 AM, Mateusz Guzik <mjguzik_at_gmail.com> wrote:

> This is your friendly reminder that in head struct layouts can change
> and each update requires you to rebuild *all* modules (including ones
> which come from ports). In practice you can get away without it most of
> the time, but if in doubt or seeing funny crashes - *recompile* and test
> with that.
>
> I'm sending this because in upcomming weeks struct thread (and probably
> more) will start getting fields moved around to improve cache-locality
> and reduce memory waste
>
> Both problems types are well known and rather widespread in big
> real-world c codebases.
>
> 1. memory waste
> Consider a 64-bit platform with 32-bit ints and 64-bit pointers
> (coincidently that's e.g. amd64 on *BSD, Linux, Illumos and others):
>
> struct crap {
>         int i1;
>         void *p1;
>         int i2;
>         void *p2;
> };
>
> Normallly fields are aligned to their size. So in particular p1 will be
> aligned to *8* bytes. But since sizeof i1 is only 4 bytes, there are
> another 4 bytes straight up wasted. The total sizeo of the obj is 32
> bytes.
>
> That is, if an object of type struct crap is at address 0x1000, fields
> will be:
>
> 0x1000 i1
> 0x1008 p1
> 0x1010 i2
> 0x1018 p2
>
> Instead, the same can be reshuffled:
> struct crap2 {
>         int i1;
>         int i2;
>         void *p1;
>         void *p2;
> };
>
> With offsets:
>
> 0x1000 i1
> 0x1004 i2
> 0x1008 p1
> 0x1010 p2
>
> This is only 24 bytes. 2 ints can be placed together and since they add
> up to 8 the p1 pointer gets the right alignment without extra padding.
>
> struct thread accumulated some of this and can just shrink without
> removing anything.
>
> Interested parties can read http://www.catb.org/esr/structure-packing/
>
> 2. cacheline bouncing (profesionnal term: cacheline ping pong)
>
> cpus store main memory content in local caches. the smallest unit it
> reads is 64 bytes (aligned to 64, i.e. reading of 0x1010 will fetch
> 0x1000).
>
> There are fields which are accessed only by the thread owning the
> struct. If they happen to share the line with something modified by
> other threads we lose on performance as now the cpu has to talk to
> some other cpu which has the line modified. This is increasingly painful
> on numa systems, where response times are longer.
>
> Furthermore, if fields frequently read/modified together are very far
> apart, chances are they require avoidable memory fetches - instead of
> taking just one line, they may take several. As cache size is finite,
> this may mean something else useful has to be evicted.
>
> For interested parties I can't recommend enough:
> https://www.kernel.org/pub/linux/kernel/people/paulmck/
> perfbook/perfbook.html
>
> --
> Mateusz Guzik <mjguzik gmail.com>
> _______________________________________________
> freebsd-current_at_freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org"
>
Received on Mon Oct 23 2017 - 19:19:25 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:13 UTC