Re: SSE in libthr

From: David Chisnall <theraven_at_FreeBSD.org> Date: Sat, 28 Mar 2015 15:21:07 +0000 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:56 UTC

On 28 Mar 2015, at 13:54, Julian Elischer <julian_at_freebsd.org> wrote:
> 
> the point is that clang will do this anywhere it can, because it isn't taking into account the
> side effects, just the speed of the commands themselves.

This is also something that is not going to decrease.  Clang now enables the SLP vectoriser by default and this code is constantly being improved.  Current generation vector units are explicitly designed as targets for compiler autovectorisation, not for hand-tuned DSP code (which, increasingly, runs on the GPU anyway).  This means that we're increasingly going to see SSE/AVX/NEON usage in CPU-bound code, even without an explicit programmer decision to do so.  Optimising for the case when the vector unit is not used is about as sensible as optimising for the single-core case: it will affect some people, but generally not those who care about performance, and a decreasing number of people over time.

David