Optimized copy&move (was: Re: [PATCH] Mantaining turnstile aligned to 128 bytes in i386 CPUs)

From: Ivan Voras <ivoras_at_fer.hr>
Date: Wed, 17 Jan 2007 20:41:44 +0100
Bruce Evans wrote:

> And MMX/XMM registers ar not needed to get movnt on machines with SSE2,
> since movnti is part of SSE2.  This reduces the advantages of using MMX/XMM
> registers on P4's and A64's in 32-bit mode to the non-nt parts of the
> above (fully cached case), which I think are less important than the nt
> parts.

Hmm, I'm looking at i386/i386/support.s and there are several versions
of bcopy and bmove functions, including some that optimize by using FPU
registers (large_i586_bcopy_loop), and a version that uses movnti
(sse2_pagezero), but I can't find the bit of magic which glues them to
bzero() call.

Also, as as I can tell by the comments, the FPU version works by
manually saving context... why is this possible (i.e. won't something
preempt it?)


Received on Wed Jan 17 2007 - 19:09:24 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:04 UTC