Re: [PATCH] Mantaining turnstile aligned to 128 bytes in i386 CPUs

From: Chuck Swiger <cswiger_at_mac.com> Date: Thu, 18 Jan 2007 14:47:55 -0800 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:04 UTC

On Jan 18, 2007, at 2:28 PM, Maxim Sobolev wrote:
>> Unfortunately, there are simply different tradeoffs between  
>> mechanisms for copying depending on whether you want to use or  
>> avoid using/thrashing the L1/L2 caches, whether the data is cache- 
>> aligned, and so forth; the CPU can't infer what you want to  
>> occur-- you have to tell it.  I find it interesting that some of  
>> the architectures (PA-RISC,
>
> Well, of course there are some special cases, but in general there  
> should be some baseline suitable for most of uses. That's why we  
> (and most other operating systems) only provide single version for  
> the mem*(3) APIs.

Well, a truly generic version in is lib/libc/string/bcopy.c; it's  
architecture-neutral (ie, it's pure C code) and it handles all kinds  
of things like overlapping source and destination addresses, non- 
aligned access, and so forth.  The downside is that it's slower than  
using movl/movsl, much less some of the fancier variants that Bruce  
and Matt have been discussing (in considerable, interesting detail)  
earlier:

   http://now.cs.berkeley.edu/Td/bcopy.html

If you're only moving, say, 5 bytes, the overhead of fancy loop  
unrolling and prefetching and so forth isn't going to help compared  
with a simple movb/movl combination, so it really depends.

-- 
-Chuck