This is your friendly reminder that in head struct layouts can change and each update requires you to rebuild *all* modules (including ones which come from ports). In practice you can get away without it most of the time, but if in doubt or seeing funny crashes - *recompile* and test with that. I'm sending this because in upcomming weeks struct thread (and probably more) will start getting fields moved around to improve cache-locality and reduce memory waste Both problems types are well known and rather widespread in big real-world c codebases. 1. memory waste Consider a 64-bit platform with 32-bit ints and 64-bit pointers (coincidently that's e.g. amd64 on *BSD, Linux, Illumos and others): struct crap { int i1; void *p1; int i2; void *p2; }; Normallly fields are aligned to their size. So in particular p1 will be aligned to *8* bytes. But since sizeof i1 is only 4 bytes, there are another 4 bytes straight up wasted. The total sizeo of the obj is 32 bytes. That is, if an object of type struct crap is at address 0x1000, fields will be: 0x1000 i1 0x1008 p1 0x1010 i2 0x1018 p2 Instead, the same can be reshuffled: struct crap2 { int i1; int i2; void *p1; void *p2; }; With offsets: 0x1000 i1 0x1004 i2 0x1008 p1 0x1010 p2 This is only 24 bytes. 2 ints can be placed together and since they add up to 8 the p1 pointer gets the right alignment without extra padding. struct thread accumulated some of this and can just shrink without removing anything. Interested parties can read http://www.catb.org/esr/structure-packing/ 2. cacheline bouncing (profesionnal term: cacheline ping pong) cpus store main memory content in local caches. the smallest unit it reads is 64 bytes (aligned to 64, i.e. reading of 0x1010 will fetch 0x1000). There are fields which are accessed only by the thread owning the struct. If they happen to share the line with something modified by other threads we lose on performance as now the cpu has to talk to some other cpu which has the line modified. This is increasingly painful on numa systems, where response times are longer. Furthermore, if fields frequently read/modified together are very far apart, chances are they require avoidable memory fetches - instead of taking just one line, they may take several. As cache size is finite, this may mean something else useful has to be evicted. For interested parties I can't recommend enough: https://www.kernel.org/pub/linux/kernel/people/paulmck/perfbook/perfbook.html -- Mateusz Guzik <mjguzik gmail.com>Received on Mon Oct 23 2017 - 18:35:21 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:13 UTC