Re: [PATCH] Mantaining turnstile aligned to 128 bytes in i386 CPUs

From: Ricardo Nabinger Sanchez <rnsanchez_at_wait4.org> Date: Wed, 17 Jan 2007 13:41:00 -0200 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:04 UTC

On Wed, 17 Jan 2007 15:50:41 +1100 (EST)
Bruce Evans <bde_at_zeta.org.au> wrote:

> AXP: (my 5 year old system with a newer CPU): movq through MMX is 60%
>     faster than movsl for cached moves, but movdqa through XMM is only 4%
>     faster.  movnt with block prefetch is 155% faster than movsl with no
>     prefetch, and 73% faster with no prefetch for both.
> A64 in 32-bit mode: in between P4 and AXP (closer to AXP).  movsl doesn't
>     lose by so much, and prefetchnta actually works so block prefetch is
>     not needed and there is a better chance of prefetching helping more
>     than benchmarks.

This PDF is somewhat dated, but perhaps some of it still applies today:

http://cdrom.amd.com/devconn/events/AMD_block_prefetch_paper.pdf

-- 
Ricardo Nabinger Sanchez     <rnsanchez_at_{gmail.com,wait4.org}>
Powered by FreeBSD

  "Left to themselves, things tend to go from bad to worse."