Thanks for these, I came across them when writing some game engine code a few years back. I really enjoy this stuff because I find it down right obnoxious that code gets slower as CPU power increases! On Tue, Oct 24, 2017 at 4:35 AM, Mateusz Guzik <mjguzik_at_gmail.com> wrote: > This is your friendly reminder that in head struct layouts can change > and each update requires you to rebuild *all* modules (including ones > which come from ports). In practice you can get away without it most of > the time, but if in doubt or seeing funny crashes - *recompile* and test > with that. > > I'm sending this because in upcomming weeks struct thread (and probably > more) will start getting fields moved around to improve cache-locality > and reduce memory waste > > Both problems types are well known and rather widespread in big > real-world c codebases. > > 1. memory waste > Consider a 64-bit platform with 32-bit ints and 64-bit pointers > (coincidently that's e.g. amd64 on *BSD, Linux, Illumos and others): > > struct crap { > int i1; > void *p1; > int i2; > void *p2; > }; > > Normallly fields are aligned to their size. So in particular p1 will be > aligned to *8* bytes. But since sizeof i1 is only 4 bytes, there are > another 4 bytes straight up wasted. The total sizeo of the obj is 32 > bytes. > > That is, if an object of type struct crap is at address 0x1000, fields > will be: > > 0x1000 i1 > 0x1008 p1 > 0x1010 i2 > 0x1018 p2 > > Instead, the same can be reshuffled: > struct crap2 { > int i1; > int i2; > void *p1; > void *p2; > }; > > With offsets: > > 0x1000 i1 > 0x1004 i2 > 0x1008 p1 > 0x1010 p2 > > This is only 24 bytes. 2 ints can be placed together and since they add > up to 8 the p1 pointer gets the right alignment without extra padding. > > struct thread accumulated some of this and can just shrink without > removing anything. > > Interested parties can read http://www.catb.org/esr/structure-packing/ > > 2. cacheline bouncing (profesionnal term: cacheline ping pong) > > cpus store main memory content in local caches. the smallest unit it > reads is 64 bytes (aligned to 64, i.e. reading of 0x1010 will fetch > 0x1000). > > There are fields which are accessed only by the thread owning the > struct. If they happen to share the line with something modified by > other threads we lose on performance as now the cpu has to talk to > some other cpu which has the line modified. This is increasingly painful > on numa systems, where response times are longer. > > Furthermore, if fields frequently read/modified together are very far > apart, chances are they require avoidable memory fetches - instead of > taking just one line, they may take several. As cache size is finite, > this may mean something else useful has to be evicted. > > For interested parties I can't recommend enough: > https://www.kernel.org/pub/linux/kernel/people/paulmck/ > perfbook/perfbook.html > > -- > Mateusz Guzik <mjguzik gmail.com> > _______________________________________________ > freebsd-current_at_freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org" >Received on Mon Oct 23 2017 - 19:19:25 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:13 UTC