Re: Examining the VM splay tree effectiveness

From: Ivan Voras <ivoras_at_freebsd.org> Date: Sun, 03 Oct 2010 20:24:27 +0200 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:08 UTC

On 10/01/10 10:54, Andre Oppermann wrote:
> On 30.09.2010 19:51, Ivan Voras wrote:
>> On 09/30/10 18:37, Andre Oppermann wrote:
>>
>>> Both the vmmap and page table make use of splay trees to manage the
>>> entries and to speed up lookups compared to long to traverse linked
>>> lists or more memory expensive hash tables.  Some structures though
>>> do have an additional linked list to simplify ordered traversals.
>>
>> The property of splay tree requiring *writes* for nearly every read
>> really is a thorn in the eye for SMP. It seems to me that even if the
>> immediate benefits from converting to something else are not directly
>> observable, it will still be worth doing it.
> 
> Fully agreed.
> 
>> It's a shame that RCU is still a patent minefield :/
>>
>> http://mirror.leaseweb.com/kernel/people/npiggin/patches/lockless/2.6.16-rc5/radix-intro.pdf
>>
> 
> I'm not convinced that RCU is the only scalable way of sharing a
> data structure across a possibly large number of CPU's.

Of course, it's just well understood currently.

> The term "lockless" is often used and frequently misunderstood.
> Having a lockess data structure *doesn't* mean that is either
> performant, scalable or both.  It heavily depends on a number
> of additional factors.  Many times "lockless" just replaces a
> simple lock/unlock cycle with a number of atomic operations on
> the data structure.  This can easily backfire because an atomic

Yes.

> operation just hides the computational complexity and also dirties
> the CPU's cache lines.  Generally on cache coherent architectures
> almost all of the atomic operation is in HW with bus lock cycles,
> bus snooping and whatnot.  While seemingly simple form the programmers
> point of view, the overhead and latency is still there.  Needless

Yes, you basically just offload the operation to hardware but the steps
it needs to make are the same in concept.

>  a) make sure the lock is held for only a small amount of time
>     to avoid lock contention.
>  b) do everything you can outside of the lock.
>  c) if the lock is found to be heavily contended rethink the
>     whole approach and check if other data structures can be used.
>  d) minimize write accesses to memory in the lock protected
>     shared data structure.
>  e) PROFILE, DON'T SPECULATE! Measure the access pattern and
>     measure the locking/data access strategy's cost in terms
>     of CPU cycles consumed.
> 
>  f) on lookup heavy data structures avoid writing to memory and
>     by it dirtying CPU caches.
>  g) on modify heavy data structures avoid touching too many
>     elements.
>  h) on lookup and modify heavy data structure that are used
>     across many CPU's all bets are off and a different data
>     structure approach should be considered resulting ideally
>     in case f).
> 
> It all starts with the hypothesis that a data structure is not
> optimally locked.

This looks like material for a developer-centric wiki page :) There is a
lot of dispersed wisdom in this thread which would be nice if gathered
in one place.