Re: [PATCH] Mantaining turnstile aligned to 128 bytes in i386 CPUs

From: Attilio Rao <attilio_at_freebsd.org>
Date: Wed, 17 Jan 2007 01:55:09 +0100
2007/1/17, Maxim Sobolev <sobomax_at_freebsd.org>:
> Attilio Rao wrote:
> > 2007/1/17, Ivan Voras <ivoras_at_fer.hr>:
> >> Kip Macy wrote:
> >> > On 1/16/07, Ivan Voras <ivoras_at_fer.hr> wrote:
> >> >> But it does seem to hurt the performance a bit - maybe it's time to
> >> add
> >> >> another CPU option like I586_CPU and I686_CPU?
> >> >
> >> > Unless there is a compelling reason not to do so, I think that that
> >> > would be a good idea.
> >>
> >> Maybe even someone finds a way to get optimized versions of memcpy in
> >> the kernel :)
> >>
> >> I was thinking: AFAIK the only major stopper is context saving of the
> >> various "auxiliary" registers - FPU, MMX, SSE, right? But is it an
> >> all-or-nothing situation? I.e. does it make sense (can it be done?) to
> >> just elect to save the MMX context? (AFAIK they are different registers
> >> than SSE, but overlay FPU registers?) The idea is to save something
> >> smaller than the full set.
> >
> > When I implemented fpu copy into the kernel I had a lot of thinking
> > about this and I think it is possible at least with some restrictions.
> > For example, for an xmm copy you would just save 8 registers content
> > but you  have to ensure no pending FPU exceptions will break your
> > kernel and so you should preserve a clean copy of FPU state or, treact
> > the corner cases you can get.
> > For xmm, after some very productive discussions with bde_at_, we arrived
> > at the conclusion that should be pretty safe to just have an 16 byte
> > aligned buffer for registers saving (in this way you can use 8 movdqa
> > for saving them) but I didn't end to play with it.
> > (My implementation should deal with the problem of pinning the
> > scheduler too, in order to avoid a wrong reading of per-cpu datas).
>
> I might be wrong, but I think the DragonFly has solved this issue (i.e.
> optimized memcpy in the kernel) somehow quite some time ago.

Dragonfly saves the whole context (xmm + mmx + fpu state). It is a too
heavy mechanism ATM for us (and for them too I suspect). The don't
need to deal with pinning too, BTW.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
Received on Tue Jan 16 2007 - 23:55:12 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:04 UTC