Re: [PATCH] microoptimize locking primitives by introducing randomized delay between atomic ops

From: Konstantin Belousov <kostikbel_at_gmail.com> Date: Sun, 10 Jul 2016 15:22:47 +0300 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:06 UTC

On Sun, Jul 10, 2016 at 01:13:26PM +0200, Mateusz Guzik wrote:
> If the lock is contended, primitives like __mtx_lock_sleep will spin
> checking if the owner is running or the lock was freed. The problem is
> that once it is discovered that the lock is free, multiple CPUs are
> likely to try to do the atomic op which will make it more costly for
> everyone and throughput suffers.
> 
> The standard thing to do is to have some sort of a randomized delay so
> that this kind of behaviour is reduced.
> 
> As such, below is a trivial hack which takes cpu_ticks() into account
> and performs % 2048, which in my testing gives reasonbly good results.
> 
> Please note there is definitely way more room for improvement in general.
> 
> In terms of results, there was no statistically significant change in
> -j 40 buildworld nor buildkernel.
> 
> However, a 40-way find on a ports tree placed on tmpfs yielded the following:

I am curious why did you added randomizer to sx adaptive loop but not to
lockmgr loop, and probably most important, to the spinlocks (unless I
misread the patch).