Bruce Evans wrote: > And MMX/XMM registers ar not needed to get movnt on machines with SSE2, > since movnti is part of SSE2. This reduces the advantages of using MMX/XMM > registers on P4's and A64's in 32-bit mode to the non-nt parts of the > above (fully cached case), which I think are less important than the nt > parts. Hmm, I'm looking at i386/i386/support.s and there are several versions of bcopy and bmove functions, including some that optimize by using FPU registers (large_i586_bcopy_loop), and a version that uses movnti (sse2_pagezero), but I can't find the bit of magic which glues them to bzero() call. Also, as as I can tell by the comments, the FPU version works by manually saving context... why is this possible (i.e. won't something preempt it?)
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:04 UTC