On Wed, Jan 23, 2013 at 05:32:38PM +0100, Luigi Rizzo wrote: > Probably our compiler folks have some ideas on this... > > When doing netmap i found that on FreeBSD memcpy/bcopy was expensive, > __builtin_memcpy() was even worse, and so i ended up writing > my custom routine, (called pkt_copy() in the program below). > This happens with gcc 4.2.1, clang, gcc 4.6.4 > > I was then surprised to notice that on a recent ubuntu using > gcc 4.6.2 (if that matters) the __builtin_memcpy beats other > methods by a large factor. so, it turns out that in my test program I had swapped the source and destination operands for __builtin_memcpy(), and this substantially changed the memory access pattern. With the correct operands, __builtin_memcpy == memcpy == bcopy on both FreeBSD and Linux. On FreeBSD pkt_copy is still faster than the other methods for small packets, whereas on Linux they are equivalent. If you are curious why swapping source and dst changed things so dramatically: the test was supposed to read from a large chunk of memory (over 1GB) to avoid always hitting L1 or L2. Swapping operands causes reads to hit always the same line, thus saving a lot of misses. The difference between the two machine then probably is due to how the cache is used on writes. sorry for the noise. luigiReceived on Thu Jan 24 2013 - 01:55:05 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:34 UTC