2012/5/3, Steven Atreju <snatreju_at_googlemail.com>: > K. Macy wrote [2012-05-03 02:58+0200]: >> It's highly chipset and processor dependent what works best. > > Yes, of course. > Though i was kinda, even shocked, once i've seen this first: > > http://marc.info/?l=dragonfly-commits&m=132241713812022&w=2 > > So we don't use our assembler version for new gccs and HAMMER or > SSE3+ (the decision for these was rather arbitrarily, except they > were yet existent for an instant implementation). > >> Intel now has non-temporal loads and stores which work much >> better in some cases but provide little benefit in others. > > Yes, our 2002 tests have shown that these were *extremely* > dependent upon alignment. (Note: 2002. o-) > Hmm, it doesn't really matter, but i guess this is a good time to > thank the FreeBSD hackers for that FPU stack FILD/FISTP idea! > I'll append the copy related notes of our doc/memperf.txt. > Thanks, I made an implementation of fpu unwinding and mmx copy to see if they were really making a difference years ago (reimplementing bcopy, memcopy, etc.). What really mattered with hw available at that time (pentium4) was the alignment and use of non-temporal operations on heavilly contended cache-lines. In few words it is more important we engineer the "buffer" layout rather than the functions themselves. Attilio -- Peace can only be achieved by understanding - A. EinsteinReceived on Thu May 03 2012 - 08:49:40 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:26 UTC