There is *NO* abi stability in -head

From: Mateusz Guzik <mjguzik_at_gmail.com>
Date: Mon, 23 Oct 2017 22:35:20 +0200
This is your friendly reminder that in head struct layouts can change
and each update requires you to rebuild *all* modules (including ones
which come from ports). In practice you can get away without it most of
the time, but if in doubt or seeing funny crashes - *recompile* and test
with that.

I'm sending this because in upcomming weeks struct thread (and probably
more) will start getting fields moved around to improve cache-locality
and reduce memory waste

Both problems types are well known and rather widespread in big
real-world c codebases.

1. memory waste
Consider a 64-bit platform with 32-bit ints and 64-bit pointers
(coincidently that's e.g. amd64 on *BSD, Linux, Illumos and others):

struct crap {
        int i1;
        void *p1;
        int i2;
        void *p2;
};

Normallly fields are aligned to their size. So in particular p1 will be
aligned to *8* bytes. But since sizeof i1 is only 4 bytes, there are
another 4 bytes straight up wasted. The total sizeo of the obj is 32
bytes.

That is, if an object of type struct crap is at address 0x1000, fields
will be:

0x1000 i1
0x1008 p1
0x1010 i2
0x1018 p2

Instead, the same can be reshuffled:
struct crap2 {
        int i1;
        int i2;
        void *p1;
        void *p2;
};

With offsets:

0x1000 i1
0x1004 i2
0x1008 p1
0x1010 p2

This is only 24 bytes. 2 ints can be placed together and since they add
up to 8 the p1 pointer gets the right alignment without extra padding.

struct thread accumulated some of this and can just shrink without
removing anything.

Interested parties can read http://www.catb.org/esr/structure-packing/

2. cacheline bouncing (profesionnal term: cacheline ping pong)

cpus store main memory content in local caches. the smallest unit it
reads is 64 bytes (aligned to 64, i.e. reading of 0x1010 will fetch
0x1000).

There are fields which are accessed only by the thread owning the
struct. If they happen to share the line with something modified by
other threads we lose on performance as now the cpu has to talk to
some other cpu which has the line modified. This is increasingly painful
on numa systems, where response times are longer.

Furthermore, if fields frequently read/modified together are very far
apart, chances are they require avoidable memory fetches - instead of
taking just one line, they may take several. As cache size is finite,
this may mean something else useful has to be evicted.

For interested parties I can't recommend enough:
https://www.kernel.org/pub/linux/kernel/people/paulmck/perfbook/perfbook.html

-- 
Mateusz Guzik <mjguzik gmail.com>
Received on Mon Oct 23 2017 - 18:35:21 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:13 UTC