In a nutshell: Clang emits SSE instructions on amd64 in the common path of pthread_mutex_unlock. This reduces performance by a non-trivial amount. I'd like to disable SSE in libthr. In more detail: In libthr/thread/thr_mutex.c, we find the following: #define MUTEX_INIT_LINK(m) do { \ (m)->m_qe.tqe_prev = NULL; \ (m)->m_qe.tqe_next = NULL; \ } while (0) In 9.1, clang 3.1 emits two ordinary mov instructions: movq $0x0,0x8(%rax) movq $0x0,(%rax) Since 10.0 and clang 3.3, clang emits these SSE instructions: xorps %xmm0,%xmm0 movups %xmm0,(%rax) Although these look harmless enough, using the FPU can reduce performance by incurring extra overhead due to context-switching the FPU state. As I mentioned, this code is used in the common path of pthread_mutex_unlock. I have a simple test program that creates four threads, all contending for a single mutex, and measures the total number of lock acquisitions over several seconds. When libthr is built with SSE, as is current, I get around 53 million locks in 5 seconds. Without SSE, I get around 60 million (13% more). DTrace shows around 790,000 calls to fpudna versus 10 calls. There could be other factors involved, but I presume that the FPU context switches account for most of the change in performance. Even when I add some SSE usage in the application--incidentally, these same instructions--building libthr without SSE improves performance from 53.5 million to 55.8 million (4.3%). In the real-world application where I first noticed this, performance improves by 3-5%. I would appreciate your thoughts and feedback. The proposed patch is below. Eric Index: base/head/lib/libthr/arch/amd64/Makefile.inc =================================================================== --- base/head/lib/libthr/arch/amd64/Makefile.inc (revision 280703) +++ base/head/lib/libthr/arch/amd64/Makefile.inc (working copy) _at__at_ -1,3 +1,8 _at__at_ #$FreeBSD$ SRCS+= _umtx_op_err.S + +# Using SSE incurs extra overhead per context switch, +# which measurably impacts performance when the application +# does not otherwise use FP/SSE. +CFLAGS+=-mno-sseReceived on Fri Mar 27 2015 - 18:27:27 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:56 UTC