false alarm (Re: __builtin_memcpy() slower than memcpy/bcopy (and on linux it is the opposite) ?)

From: Luigi Rizzo <rizzo_at_iet.unipi.it>
Date: Thu, 24 Jan 2013 03:54:42 +0100
On Wed, Jan 23, 2013 at 05:32:38PM +0100, Luigi Rizzo wrote:
> Probably our compiler folks have some ideas on this...
> 
> When doing netmap i found that on FreeBSD memcpy/bcopy was expensive,
> __builtin_memcpy() was even worse, and so i ended up writing
> my custom routine, (called pkt_copy() in the program below).
> This happens with gcc 4.2.1, clang, gcc 4.6.4
> 
> I was then surprised to notice that on a recent ubuntu using
> gcc 4.6.2 (if that matters) the __builtin_memcpy beats other
> methods by a large factor.

so, it turns out that in my test program I had swapped the
source and destination operands for __builtin_memcpy(), and
this substantially changed the memory access pattern.

With the correct operands, __builtin_memcpy == memcpy == bcopy
on both FreeBSD and Linux.
On FreeBSD pkt_copy is still faster than the other methods for
small packets, whereas on Linux they are equivalent.

If you are curious why swapping source and dst changed things
so dramatically:

the test was supposed to read from a large chunk of
memory (over 1GB) to avoid always hitting L1 or L2.
Swapping operands causes reads to hit always the same line,
thus saving a lot of misses. The difference between the two
machine then probably is due to how the cache is used on writes.

sorry for the noise.
luigi
Received on Thu Jan 24 2013 - 01:55:05 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:34 UTC